# Runtime Data Assets Runtime data assets are files that Stella Ops services need at runtime but that are **not produced by `dotnet publish`** or the Angular build. They must be provisioned separately — either baked into Docker images, mounted as volumes, or supplied via an init container. This directory contains the canonical inventory, acquisition scripts, and packaging tools for all such assets. **If you are setting up Stella Ops for the first time**, read this document before running `docker compose up`. Services will start without these assets but will operate in degraded mode (no semantic search, no binary analysis, dev-only certificates). --- ## Quick reference | Category | Required? | Size | Provisioned by | |---|---|---|---| | [ML model weights](#1-ml-model-weights) | Yes (for semantic search) | ~80 MB | `acquire.sh` | | [JDK + Ghidra](#2-jdk--ghidra) | Optional (binary analysis) | ~1.6 GB | `acquire.sh` | | [Search seed snapshots](#3-search-seed-snapshots) | Yes (first boot) | ~7 KB | Included in source | | [Translations (i18n)](#4-translations-i18n) | Yes | ~500 KB | Baked into Angular dist | | [Certificates and trust stores](#5-certificates-and-trust-stores) | Yes | ~50 KB | `etc/` + volume mounts | | [Regional crypto configuration](#6-regional-crypto-configuration) | Per region | ~20 KB | Compose overlays | | [Evidence storage](#7-evidence-storage) | Yes | Grows | Persistent named volume | | [Vulnerability feeds](#8-vulnerability-feeds) | Yes (offline) | ~300 MB | Offline Kit (`docs/OFFLINE_KIT.md`) | --- ## 1. ML model weights **What:** The `all-MiniLM-L6-v2` sentence-transformer model in ONNX format, used by `OnnxVectorEncoder` for semantic vector search in AdvisoryAI. **License:** Apache-2.0 (compatible with BUSL-1.1; see `third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt`). **Where it goes:** ``` /models/all-MiniLM-L6-v2.onnx ``` Configurable via `KnowledgeSearch__OnnxModelPath` environment variable. **How to acquire:** ```bash # Option A: use the acquisition script (recommended) ./devops/runtime-assets/acquire.sh --models # Option B: manual download mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \ -o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx ``` **Verification:** ```bash sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx # Expected: see manifest.yaml for pinned digest ``` **Degraded mode:** If the model file is missing or is a placeholder, the encoder falls back to a deterministic character-ngram projection. Search works but semantic quality is significantly reduced. **Docker / Compose mount:** ```yaml services: advisory-ai-web: volumes: - ml-models:/app/models:ro volumes: ml-models: driver: local ``` **Air-gap:** Include the `.onnx` file in the Offline Kit under `models/all-MiniLM-L6-v2.onnx`. The `acquire.sh --package` command produces a verified tarball for sneakernet transfer. --- ## 2. JDK + Ghidra **What:** OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary analysis (decompilation, BSim similarity, call-graph extraction). **License:** OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra — Apache-2.0 (NSA release). **Required only when:** `GhidraOptions__Enabled=true` (default). Set to `false` to skip entirely if binary analysis is not needed. **Where it goes:** ``` /opt/java/openjdk/ # JDK installation (JAVA_HOME) /opt/ghidra/ # Ghidra installation (GhidraOptions__GhidraHome) /tmp/stellaops-ghidra/ # Workspace (GhidraOptions__WorkDir) — writable ``` **How to acquire:** ```bash # Option A: use the acquisition script ./devops/runtime-assets/acquire.sh --ghidra # Option B: manual # JDK (Eclipse Temurin 17) curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \ | tar -xz -C /opt/java/ # Ghidra 11.2 curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \ -o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/ ``` **Docker:** For services that need Ghidra, use a dedicated Dockerfile stage or a sidecar data image. See `docs/modules/binary-index/ghidra-deployment.md`. **Air-gap:** Pre-download both archives on a connected machine and include them in the Offline Kit under `tools/jdk/` and `tools/ghidra/`. --- ## 3. Search seed snapshots **What:** Small JSON files that bootstrap the unified search index on first start. Without them, search returns empty results until live data adapters populate the index. **Where they are:** ``` src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/ findings.snapshot.json (1.3 KB) vex.snapshot.json (1.2 KB) policy.snapshot.json (1.2 KB) graph.snapshot.json (758 B) scanner.snapshot.json (751 B) opsmemory.snapshot.json (1.1 KB) timeline.snapshot.json (824 B) ``` **How they get into the image:** The `.csproj` copies them to the output directory via `` items. They are included in `dotnet publish` output automatically. **Runtime behavior:** `UnifiedSearchIndexer` loads them at startup and refreshes from live data adapters every 300 seconds (`UnifiedSearch__AutoRefreshIntervalSeconds`). **No separate provisioning needed** unless you want to supply custom seed data, in which case mount a volume at the snapshot path and set: ``` KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json ``` --- ## 4. Translations (i18n) **What:** JSON translation bundles for the Angular frontend, supporting 9 locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW. **Where they are:** ``` src/Web/StellaOps.Web/src/i18n/*.common.json ``` **How they get into the image:** Compiled into the Angular `dist/` bundle during `npm run build`. The console Docker image (`devops/docker/Dockerfile.console`) includes them automatically. **Runtime overrides:** The backend `TranslationRegistry` supports database-backed translation overrides (priority 100) over file-based bundles (priority 10). For custom translations in offline environments, seed the database or mount override JSON files. **No separate provisioning needed** for standard deployments. --- ## 5. Certificates and trust stores **What:** TLS certificates, signing keys, and CA trust bundles for inter-service communication and attestation verification. **Development defaults (not for production):** ``` etc/authority/keys/ kestrel-dev.pfx # Kestrel TLS (password: devpass) kestrel-dev.crt / .key ack-token-dev.pem # Token signing key signing-dev.pem # Service signing key etc/trust-profiles/assets/ ca.crt # Root CA bundle rekor-public.pem # Rekor transparency log public key ``` **Compose mounts (already configured):** ```yaml volumes: - ../../etc/authority/keys:/app/etc/certs:ro - ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro ``` **Production:** Replace dev certificates with properly issued certificates. Mount as read-only volumes. See `docs/SECURITY_HARDENING_GUIDE.md`. **Air-gap:** Include the full trust chain in the Offline Kit. For Russian deployments, include `certificates/russian_trusted_bundle.pem` (see `docs/OFFLINE_KIT.md`). --- ## 6. Regional crypto configuration **What:** YAML configuration files that select the cryptographic profile (algorithms, key types, HSM settings) per deployment region. **Files:** ``` etc/appsettings.crypto.international.yaml # Default (ECDSA/RSA/EdDSA) etc/appsettings.crypto.eu.yaml # eIDAS qualified signatures etc/appsettings.crypto.russia.yaml # GOST R 34.10/34.11 etc/appsettings.crypto.china.yaml # SM2/SM3/SM4 etc/crypto-plugins-manifest.json # Plugin registry ``` **Selection:** Via Docker Compose overlays: ```bash # EU deployment docker compose -f docker-compose.stella-ops.yml \ -f docker-compose.compliance-eu.yml up -d ``` **No separate provisioning needed** — files ship in the source tree and are selected by compose overlay. See `devops/compose/README.md` for details. --- ## 7. Evidence storage **What:** Persistent storage for evidence bundles (SBOMs, attestations, signatures, scan proofs). Grows with usage. **Default path:** `/data/evidence` (named volume `evidence-data`). **Configured via:** `EvidenceLocker__ObjectStore__FileSystem__RootPath` **Compose (already configured):** ```yaml volumes: evidence-data: driver: local ``` **Sizing:** Plan ~1 GB per 1000 scans as a rough baseline. Monitor with Prometheus metric `evidence_locker_storage_bytes_total`. **Backup:** Include in PostgreSQL backup strategy. Evidence files are content-addressed and immutable — append-only, safe to rsync. --- ## 8. Vulnerability feeds **What:** Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds). Required for offline vulnerability matching. **Provisioned by:** The Offline Update Kit (`docs/OFFLINE_KIT.md`). This is a separate, well-documented workflow. See that document for full details. **Not covered by `acquire.sh`** — feed management is handled by the Concelier module and the Offline Kit import pipeline. --- ## Acquisition script The `acquire.sh` script automates downloading, verifying, and staging runtime data assets. It is idempotent — safe to run multiple times. ```bash # Acquire everything (models + Ghidra + JDK) ./devops/runtime-assets/acquire.sh --all # Models only (for environments without binary analysis) ./devops/runtime-assets/acquire.sh --models # Ghidra + JDK only ./devops/runtime-assets/acquire.sh --ghidra # Package all acquired assets into a portable tarball for air-gap transfer ./devops/runtime-assets/acquire.sh --package # Verify already-acquired assets against pinned checksums ./devops/runtime-assets/acquire.sh --verify ``` Asset checksums are pinned in `manifest.yaml` in this directory. The script verifies SHA-256 digests after every download and refuses corrupted files. --- ## Docker integration ### Option A: Bake into image (simplest) Run `acquire.sh --models` before `docker build`. The `.csproj` copies `models/all-MiniLM-L6-v2.onnx` into the publish output automatically. ### Option B: Shared data volume (recommended for production) Build a lightweight data image or use an init container: ```dockerfile # Dockerfile.runtime-assets FROM busybox:1.37 COPY models/ /data/models/ VOLUME /data/models ``` Mount in compose: ```yaml services: advisory-ai-web: volumes: - runtime-assets:/app/models:ro depends_on: runtime-assets-init: condition: service_completed_successfully runtime-assets-init: build: context: . dockerfile: devops/runtime-assets/Dockerfile.runtime-assets volumes: - runtime-assets:/data/models volumes: runtime-assets: ``` ### Option C: Air-gap tarball ```bash ./devops/runtime-assets/acquire.sh --package # Produces: out/runtime-assets/stella-ops-runtime-assets-.tar.gz # Transfer to air-gapped host, then: tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/ ``` --- ## Checklist: before you ship a release - [ ] `models/all-MiniLM-L6-v2.onnx` contains real weights (not the 120-byte placeholder) - [ ] `acquire.sh --verify` passes all checksums - [ ] Certificates are production-issued (not `*-dev.*`) - [ ] Evidence storage volume is provisioned with adequate capacity - [ ] Regional crypto profile is selected if applicable - [ ] Offline Kit includes runtime assets tarball if deploying to air-gap - [ ] `NOTICE.md` and `third-party-licenses/` are included in the image --- ## Related documentation - Installation guide: `docs/INSTALL_GUIDE.md` - Offline Update Kit: `docs/OFFLINE_KIT.md` - Security hardening: `docs/SECURITY_HARDENING_GUIDE.md` - Ghidra deployment: `docs/modules/binary-index/ghidra-deployment.md` - LLM model bundles (separate from ONNX): `docs/modules/advisory-ai/guides/offline-model-bundles.md` - Third-party dependencies: `docs/legal/THIRD-PARTY-DEPENDENCIES.md` - Compose profiles: `devops/compose/README.md`