Files
git.stella-ops.org/devops/runtime-assets/README.md

393 lines
12 KiB
Markdown

# Runtime Data Assets
Runtime data assets are files that Stella Ops services need at runtime but that
are **not produced by `dotnet publish`** or the Angular build. They must be
provisioned separately — either baked into Docker images, mounted as volumes, or
supplied via an init container.
This directory contains the canonical inventory, acquisition scripts, and
packaging tools for all such assets.
**If you are setting up Stella Ops for the first time**, read this document
before running `docker compose up`. Services will start without these assets but
will operate in degraded mode (no semantic search, no binary analysis, dev-only
certificates).
---
## Quick reference
| Category | Required? | Size | Provisioned by |
|---|---|---|---|
| [ML model weights](#1-ml-model-weights) | Yes (for semantic search) | ~80 MB | `acquire.sh` |
| [JDK + Ghidra](#2-jdk--ghidra) | Optional (binary analysis) | ~1.6 GB | `acquire.sh` |
| [Search seed snapshots](#3-search-seed-snapshots) | Yes (first boot) | ~7 KB | Included in source |
| [Translations (i18n)](#4-translations-i18n) | Yes | ~500 KB | Baked into Angular dist |
| [Certificates and trust stores](#5-certificates-and-trust-stores) | Yes | ~50 KB | `etc/` + volume mounts |
| [Regional crypto configuration](#6-regional-crypto-configuration) | Per region | ~20 KB | Compose overlays |
| [Evidence storage](#7-evidence-storage) | Yes | Grows | Persistent named volume |
| [Vulnerability feeds](#8-vulnerability-feeds) | Yes (offline) | ~300 MB | Offline Kit (`docs/OFFLINE_KIT.md`) |
---
## 1. ML model weights
**What:** The `all-MiniLM-L6-v2` sentence-transformer model in ONNX format,
used by `OnnxVectorEncoder` for semantic vector search in AdvisoryAI.
**License:** Apache-2.0 (compatible with BUSL-1.1; see `third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt`).
**Where it goes:**
```
<app-root>/models/all-MiniLM-L6-v2.onnx
```
Configurable via `KnowledgeSearch__OnnxModelPath` environment variable.
**How to acquire:**
```bash
# Option A: use the acquisition script (recommended)
./devops/runtime-assets/acquire.sh --models
# Option B: manual download
mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
-o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
```
**Verification:**
```bash
sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
# Expected: see manifest.yaml for pinned digest
```
**Degraded mode:** If the model file is missing or is a placeholder, the encoder
falls back to a deterministic character-ngram projection. Search works but
semantic quality is significantly reduced.
**Docker / Compose mount:**
```yaml
services:
advisory-ai-web:
volumes:
- ml-models:/app/models:ro
volumes:
ml-models:
driver: local
```
**Air-gap:** Include the `.onnx` file in the Offline Kit under
`models/all-MiniLM-L6-v2.onnx`. The `acquire.sh --package` command produces a
verified tarball for sneakernet transfer.
---
## 2. JDK + Ghidra
**What:** OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary
analysis (decompilation, BSim similarity, call-graph extraction).
**License:** OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra —
Apache-2.0 (NSA release).
**Required only when:** `GhidraOptions__Enabled=true` (default). Set to `false`
to skip entirely if binary analysis is not needed.
**Where it goes:**
```
/opt/java/openjdk/ # JDK installation (JAVA_HOME)
/opt/ghidra/ # Ghidra installation (GhidraOptions__GhidraHome)
/tmp/stellaops-ghidra/ # Workspace (GhidraOptions__WorkDir) — writable
```
**How to acquire:**
```bash
# Option A: use the acquisition script
./devops/runtime-assets/acquire.sh --ghidra
# Option B: manual
# JDK (Eclipse Temurin 17)
curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \
| tar -xz -C /opt/java/
# Ghidra 11.2
curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \
-o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/
```
**Docker:** For services that need Ghidra, use a dedicated Dockerfile stage or a
sidecar data image. See `docs/modules/binary-index/ghidra-deployment.md`.
**Air-gap:** Pre-download both archives on a connected machine and include them
in the Offline Kit under `tools/jdk/` and `tools/ghidra/`.
---
## 3. Search seed snapshots
**What:** Small JSON files that bootstrap the unified search index on first
start. Without them, search returns empty results until live data adapters
populate the index.
**Where they are:**
```
src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/
findings.snapshot.json (1.3 KB)
vex.snapshot.json (1.2 KB)
policy.snapshot.json (1.2 KB)
graph.snapshot.json (758 B)
scanner.snapshot.json (751 B)
opsmemory.snapshot.json (1.1 KB)
timeline.snapshot.json (824 B)
```
**How they get into the image:** The `.csproj` copies them to the output
directory via `<Content>` items. They are included in `dotnet publish` output
automatically.
**Runtime behavior:** `UnifiedSearchIndexer` loads them at startup and refreshes
from live data adapters every 300 seconds (`UnifiedSearch__AutoRefreshIntervalSeconds`).
**No separate provisioning needed** unless you want to supply custom seed data,
in which case mount a volume at the snapshot path and set:
```
KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json
```
---
## 4. Translations (i18n)
**What:** JSON translation bundles for the Angular frontend, supporting 9
locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW.
**Where they are:**
```
src/Web/StellaOps.Web/src/i18n/*.common.json
```
**How they get into the image:** Compiled into the Angular `dist/` bundle during
`npm run build`. The console Docker image (`devops/docker/Dockerfile.console`)
includes them automatically.
**Runtime overrides:** The backend `TranslationRegistry` supports
database-backed translation overrides (priority 100) over file-based bundles
(priority 10). For custom translations in offline environments, seed the
database or mount override JSON files.
**No separate provisioning needed** for standard deployments.
---
## 5. Certificates and trust stores
**What:** TLS certificates, signing keys, and CA trust bundles for inter-service
communication and attestation verification.
**Development defaults (not for production):**
```
etc/authority/keys/
kestrel-dev.pfx # Kestrel TLS (password: devpass)
kestrel-dev.crt / .key
ack-token-dev.pem # Token signing key
signing-dev.pem # Service signing key
etc/trust-profiles/assets/
ca.crt # Root CA bundle
rekor-public.pem # Rekor transparency log public key
```
**Compose mounts (already configured):**
```yaml
volumes:
- ../../etc/authority/keys:/app/etc/certs:ro
- ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro
```
**Production:** Replace dev certificates with properly issued certificates.
Mount as read-only volumes. See `docs/SECURITY_HARDENING_GUIDE.md`.
**Air-gap:** Include the full trust chain in the Offline Kit. For Russian
deployments, include `certificates/russian_trusted_bundle.pem` (see
`docs/OFFLINE_KIT.md`).
---
## 6. Regional crypto configuration
**What:** YAML configuration files that select the cryptographic profile
(algorithms, key types, HSM settings) per deployment region.
**Files:**
```
etc/appsettings.crypto.international.yaml # Default (ECDSA/RSA/EdDSA)
etc/appsettings.crypto.eu.yaml # eIDAS qualified signatures
etc/appsettings.crypto.russia.yaml # GOST R 34.10/34.11
etc/appsettings.crypto.china.yaml # SM2/SM3/SM4
etc/crypto-plugins-manifest.json # Plugin registry
```
**Selection:** Via Docker Compose overlays:
```bash
# EU deployment
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-eu.yml up -d
```
**No separate provisioning needed** — files ship in the source tree and are
selected by compose overlay. See `devops/compose/README.md` for details.
---
## 7. Evidence storage
**What:** Persistent storage for evidence bundles (SBOMs, attestations,
signatures, scan proofs). Grows with usage.
**Default path:** `/data/evidence` (named volume `evidence-data`).
**Configured via:** `EvidenceLocker__ObjectStore__FileSystem__RootPath`
**Compose (already configured):**
```yaml
volumes:
evidence-data:
driver: local
```
**Sizing:** Plan ~1 GB per 1000 scans as a rough baseline. Monitor with
Prometheus metric `evidence_locker_storage_bytes_total`.
**Backup:** Include in PostgreSQL backup strategy. Evidence files are
content-addressed and immutable — append-only, safe to rsync.
---
## 8. Vulnerability feeds
**What:** Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds).
Required for offline vulnerability matching.
**Provisioned by:** The Offline Update Kit (`docs/OFFLINE_KIT.md`). This is a
separate, well-documented workflow. See that document for full details.
**Not covered by `acquire.sh`** — feed management is handled by the Concelier
module and the Offline Kit import pipeline.
---
## Acquisition script
The `acquire.sh` script automates downloading, verifying, and staging runtime
data assets. It is idempotent — safe to run multiple times.
```bash
# Acquire everything (models + Ghidra + JDK)
./devops/runtime-assets/acquire.sh --all
# Models only (for environments without binary analysis)
./devops/runtime-assets/acquire.sh --models
# Ghidra + JDK only
./devops/runtime-assets/acquire.sh --ghidra
# Package all acquired assets into a portable tarball for air-gap transfer
./devops/runtime-assets/acquire.sh --package
# Verify already-acquired assets against pinned checksums
./devops/runtime-assets/acquire.sh --verify
```
Asset checksums are pinned in `manifest.yaml` in this directory. The script
verifies SHA-256 digests after every download and refuses corrupted files.
---
## Docker integration
### Option A: Bake into image (simplest)
Run `acquire.sh --models` before `docker build`. The `.csproj` copies
`models/all-MiniLM-L6-v2.onnx` into the publish output automatically.
### Option B: Shared data volume (recommended for production)
Build a lightweight data image or use an init container:
```dockerfile
# Dockerfile.runtime-assets
FROM busybox:1.37
COPY models/ /data/models/
VOLUME /data/models
```
Mount in compose:
```yaml
services:
advisory-ai-web:
volumes:
- runtime-assets:/app/models:ro
depends_on:
runtime-assets-init:
condition: service_completed_successfully
runtime-assets-init:
build:
context: .
dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
volumes:
- runtime-assets:/data/models
volumes:
runtime-assets:
```
### Option C: Air-gap tarball
```bash
./devops/runtime-assets/acquire.sh --package
# Produces: out/runtime-assets/stella-ops-runtime-assets-<date>.tar.gz
# Transfer to air-gapped host, then:
tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/
```
---
## Checklist: before you ship a release
- [ ] `models/all-MiniLM-L6-v2.onnx` contains real weights (not the 120-byte placeholder)
- [ ] `acquire.sh --verify` passes all checksums
- [ ] Certificates are production-issued (not `*-dev.*`)
- [ ] Evidence storage volume is provisioned with adequate capacity
- [ ] Regional crypto profile is selected if applicable
- [ ] Offline Kit includes runtime assets tarball if deploying to air-gap
- [ ] `NOTICE.md` and `third-party-licenses/` are included in the image
---
## Related documentation
- Installation guide: `docs/INSTALL_GUIDE.md`
- Offline Update Kit: `docs/OFFLINE_KIT.md`
- Security hardening: `docs/SECURITY_HARDENING_GUIDE.md`
- Ghidra deployment: `docs/modules/binary-index/ghidra-deployment.md`
- LLM model bundles (separate from ONNX): `docs/modules/advisory-ai/guides/offline-model-bundles.md`
- Third-party dependencies: `docs/legal/THIRD-PARTY-DEPENDENCIES.md`
- Compose profiles: `devops/compose/README.md`