393 lines
12 KiB
Markdown
393 lines
12 KiB
Markdown
# Runtime Data Assets
|
|
|
|
Runtime data assets are files that Stella Ops services need at runtime but that
|
|
are **not produced by `dotnet publish`** or the Angular build. They must be
|
|
provisioned separately — either baked into Docker images, mounted as volumes, or
|
|
supplied via an init container.
|
|
|
|
This directory contains the canonical inventory, acquisition scripts, and
|
|
packaging tools for all such assets.
|
|
|
|
**If you are setting up Stella Ops for the first time**, read this document
|
|
before running `docker compose up`. Services will start without these assets but
|
|
will operate in degraded mode (no semantic search, no binary analysis, dev-only
|
|
certificates).
|
|
|
|
---
|
|
|
|
## Quick reference
|
|
|
|
| Category | Required? | Size | Provisioned by |
|
|
|---|---|---|---|
|
|
| [ML model weights](#1-ml-model-weights) | Yes (for semantic search) | ~80 MB | `acquire.sh` |
|
|
| [JDK + Ghidra](#2-jdk--ghidra) | Optional (binary analysis) | ~1.6 GB | `acquire.sh` |
|
|
| [Search seed snapshots](#3-search-seed-snapshots) | Yes (first boot) | ~7 KB | Included in source |
|
|
| [Translations (i18n)](#4-translations-i18n) | Yes | ~500 KB | Baked into Angular dist |
|
|
| [Certificates and trust stores](#5-certificates-and-trust-stores) | Yes | ~50 KB | `etc/` + volume mounts |
|
|
| [Regional crypto configuration](#6-regional-crypto-configuration) | Per region | ~20 KB | Compose overlays |
|
|
| [Evidence storage](#7-evidence-storage) | Yes | Grows | Persistent named volume |
|
|
| [Vulnerability feeds](#8-vulnerability-feeds) | Yes (offline) | ~300 MB | Offline Kit (`docs/OFFLINE_KIT.md`) |
|
|
|
|
---
|
|
|
|
## 1. ML model weights
|
|
|
|
**What:** The `all-MiniLM-L6-v2` sentence-transformer model in ONNX format,
|
|
used by `OnnxVectorEncoder` for semantic vector search in AdvisoryAI.
|
|
|
|
**License:** Apache-2.0 (compatible with BUSL-1.1; see `third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt`).
|
|
|
|
**Where it goes:**
|
|
|
|
```
|
|
<app-root>/models/all-MiniLM-L6-v2.onnx
|
|
```
|
|
|
|
Configurable via `KnowledgeSearch__OnnxModelPath` environment variable.
|
|
|
|
**How to acquire:**
|
|
|
|
```bash
|
|
# Option A: use the acquisition script (recommended)
|
|
./devops/runtime-assets/acquire.sh --models
|
|
|
|
# Option B: manual download
|
|
mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models
|
|
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
|
|
-o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
|
|
```
|
|
|
|
**Verification:**
|
|
|
|
```bash
|
|
sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
|
|
# Expected: see manifest.yaml for pinned digest
|
|
```
|
|
|
|
**Degraded mode:** If the model file is missing or is a placeholder, the encoder
|
|
falls back to a deterministic character-ngram projection. Search works but
|
|
semantic quality is significantly reduced.
|
|
|
|
**Docker / Compose mount:**
|
|
|
|
```yaml
|
|
services:
|
|
advisory-ai-web:
|
|
volumes:
|
|
- ml-models:/app/models:ro
|
|
|
|
volumes:
|
|
ml-models:
|
|
driver: local
|
|
```
|
|
|
|
**Air-gap:** Include the `.onnx` file in the Offline Kit under
|
|
`models/all-MiniLM-L6-v2.onnx`. The `acquire.sh --package` command produces a
|
|
verified tarball for sneakernet transfer.
|
|
|
|
---
|
|
|
|
## 2. JDK + Ghidra
|
|
|
|
**What:** OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary
|
|
analysis (decompilation, BSim similarity, call-graph extraction).
|
|
|
|
**License:** OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra —
|
|
Apache-2.0 (NSA release).
|
|
|
|
**Required only when:** `GhidraOptions__Enabled=true` (default). Set to `false`
|
|
to skip entirely if binary analysis is not needed.
|
|
|
|
**Where it goes:**
|
|
|
|
```
|
|
/opt/java/openjdk/ # JDK installation (JAVA_HOME)
|
|
/opt/ghidra/ # Ghidra installation (GhidraOptions__GhidraHome)
|
|
/tmp/stellaops-ghidra/ # Workspace (GhidraOptions__WorkDir) — writable
|
|
```
|
|
|
|
**How to acquire:**
|
|
|
|
```bash
|
|
# Option A: use the acquisition script
|
|
./devops/runtime-assets/acquire.sh --ghidra
|
|
|
|
# Option B: manual
|
|
# JDK (Eclipse Temurin 17)
|
|
curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \
|
|
| tar -xz -C /opt/java/
|
|
|
|
# Ghidra 11.2
|
|
curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \
|
|
-o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/
|
|
```
|
|
|
|
**Docker:** For services that need Ghidra, use a dedicated Dockerfile stage or a
|
|
sidecar data image. See `docs/modules/binary-index/ghidra-deployment.md`.
|
|
|
|
**Air-gap:** Pre-download both archives on a connected machine and include them
|
|
in the Offline Kit under `tools/jdk/` and `tools/ghidra/`.
|
|
|
|
---
|
|
|
|
## 3. Search seed snapshots
|
|
|
|
**What:** Small JSON files that bootstrap the unified search index on first
|
|
start. Without them, search returns empty results until live data adapters
|
|
populate the index.
|
|
|
|
**Where they are:**
|
|
|
|
```
|
|
src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/
|
|
findings.snapshot.json (1.3 KB)
|
|
vex.snapshot.json (1.2 KB)
|
|
policy.snapshot.json (1.2 KB)
|
|
graph.snapshot.json (758 B)
|
|
scanner.snapshot.json (751 B)
|
|
opsmemory.snapshot.json (1.1 KB)
|
|
timeline.snapshot.json (824 B)
|
|
```
|
|
|
|
**How they get into the image:** The `.csproj` copies them to the output
|
|
directory via `<Content>` items. They are included in `dotnet publish` output
|
|
automatically.
|
|
|
|
**Runtime behavior:** `UnifiedSearchIndexer` loads them at startup and refreshes
|
|
from live data adapters every 300 seconds (`UnifiedSearch__AutoRefreshIntervalSeconds`).
|
|
|
|
**No separate provisioning needed** unless you want to supply custom seed data,
|
|
in which case mount a volume at the snapshot path and set:
|
|
|
|
```
|
|
KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json
|
|
```
|
|
|
|
---
|
|
|
|
## 4. Translations (i18n)
|
|
|
|
**What:** JSON translation bundles for the Angular frontend, supporting 9
|
|
locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW.
|
|
|
|
**Where they are:**
|
|
|
|
```
|
|
src/Web/StellaOps.Web/src/i18n/*.common.json
|
|
```
|
|
|
|
**How they get into the image:** Compiled into the Angular `dist/` bundle during
|
|
`npm run build`. The console Docker image (`devops/docker/Dockerfile.console`)
|
|
includes them automatically.
|
|
|
|
**Runtime overrides:** The backend `TranslationRegistry` supports
|
|
database-backed translation overrides (priority 100) over file-based bundles
|
|
(priority 10). For custom translations in offline environments, seed the
|
|
database or mount override JSON files.
|
|
|
|
**No separate provisioning needed** for standard deployments.
|
|
|
|
---
|
|
|
|
## 5. Certificates and trust stores
|
|
|
|
**What:** TLS certificates, signing keys, and CA trust bundles for inter-service
|
|
communication and attestation verification.
|
|
|
|
**Development defaults (not for production):**
|
|
|
|
```
|
|
etc/authority/keys/
|
|
kestrel-dev.pfx # Kestrel TLS (password: devpass)
|
|
kestrel-dev.crt / .key
|
|
ack-token-dev.pem # Token signing key
|
|
signing-dev.pem # Service signing key
|
|
|
|
etc/trust-profiles/assets/
|
|
ca.crt # Root CA bundle
|
|
rekor-public.pem # Rekor transparency log public key
|
|
```
|
|
|
|
**Compose mounts (already configured):**
|
|
|
|
```yaml
|
|
volumes:
|
|
- ../../etc/authority/keys:/app/etc/certs:ro
|
|
- ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro
|
|
```
|
|
|
|
**Production:** Replace dev certificates with properly issued certificates.
|
|
Mount as read-only volumes. See `docs/SECURITY_HARDENING_GUIDE.md`.
|
|
|
|
**Air-gap:** Include the full trust chain in the Offline Kit. For Russian
|
|
deployments, include `certificates/russian_trusted_bundle.pem` (see
|
|
`docs/OFFLINE_KIT.md`).
|
|
|
|
---
|
|
|
|
## 6. Regional crypto configuration
|
|
|
|
**What:** YAML configuration files that select the cryptographic profile
|
|
(algorithms, key types, HSM settings) per deployment region.
|
|
|
|
**Files:**
|
|
|
|
```
|
|
etc/appsettings.crypto.international.yaml # Default (ECDSA/RSA/EdDSA)
|
|
etc/appsettings.crypto.eu.yaml # eIDAS qualified signatures
|
|
etc/appsettings.crypto.russia.yaml # GOST R 34.10/34.11
|
|
etc/appsettings.crypto.china.yaml # SM2/SM3/SM4
|
|
etc/crypto-plugins-manifest.json # Plugin registry
|
|
```
|
|
|
|
**Selection:** Via Docker Compose overlays:
|
|
|
|
```bash
|
|
# EU deployment
|
|
docker compose -f docker-compose.stella-ops.yml \
|
|
-f docker-compose.compliance-eu.yml up -d
|
|
```
|
|
|
|
**No separate provisioning needed** — files ship in the source tree and are
|
|
selected by compose overlay. See `devops/compose/README.md` for details.
|
|
|
|
---
|
|
|
|
## 7. Evidence storage
|
|
|
|
**What:** Persistent storage for evidence bundles (SBOMs, attestations,
|
|
signatures, scan proofs). Grows with usage.
|
|
|
|
**Default path:** `/data/evidence` (named volume `evidence-data`).
|
|
|
|
**Configured via:** `EvidenceLocker__ObjectStore__FileSystem__RootPath`
|
|
|
|
**Compose (already configured):**
|
|
|
|
```yaml
|
|
volumes:
|
|
evidence-data:
|
|
driver: local
|
|
```
|
|
|
|
**Sizing:** Plan ~1 GB per 1000 scans as a rough baseline. Monitor with
|
|
Prometheus metric `evidence_locker_storage_bytes_total`.
|
|
|
|
**Backup:** Include in PostgreSQL backup strategy. Evidence files are
|
|
content-addressed and immutable — append-only, safe to rsync.
|
|
|
|
---
|
|
|
|
## 8. Vulnerability feeds
|
|
|
|
**What:** Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds).
|
|
Required for offline vulnerability matching.
|
|
|
|
**Provisioned by:** The Offline Update Kit (`docs/OFFLINE_KIT.md`). This is a
|
|
separate, well-documented workflow. See that document for full details.
|
|
|
|
**Not covered by `acquire.sh`** — feed management is handled by the Concelier
|
|
module and the Offline Kit import pipeline.
|
|
|
|
---
|
|
|
|
## Acquisition script
|
|
|
|
The `acquire.sh` script automates downloading, verifying, and staging runtime
|
|
data assets. It is idempotent — safe to run multiple times.
|
|
|
|
```bash
|
|
# Acquire everything (models + Ghidra + JDK)
|
|
./devops/runtime-assets/acquire.sh --all
|
|
|
|
# Models only (for environments without binary analysis)
|
|
./devops/runtime-assets/acquire.sh --models
|
|
|
|
# Ghidra + JDK only
|
|
./devops/runtime-assets/acquire.sh --ghidra
|
|
|
|
# Package all acquired assets into a portable tarball for air-gap transfer
|
|
./devops/runtime-assets/acquire.sh --package
|
|
|
|
# Verify already-acquired assets against pinned checksums
|
|
./devops/runtime-assets/acquire.sh --verify
|
|
```
|
|
|
|
Asset checksums are pinned in `manifest.yaml` in this directory. The script
|
|
verifies SHA-256 digests after every download and refuses corrupted files.
|
|
|
|
---
|
|
|
|
## Docker integration
|
|
|
|
### Option A: Bake into image (simplest)
|
|
|
|
Run `acquire.sh --models` before `docker build`. The `.csproj` copies
|
|
`models/all-MiniLM-L6-v2.onnx` into the publish output automatically.
|
|
|
|
### Option B: Shared data volume (recommended for production)
|
|
|
|
Build a lightweight data image or use an init container:
|
|
|
|
```dockerfile
|
|
# Dockerfile.runtime-assets
|
|
FROM busybox:1.37
|
|
COPY models/ /data/models/
|
|
VOLUME /data/models
|
|
```
|
|
|
|
Mount in compose:
|
|
|
|
```yaml
|
|
services:
|
|
advisory-ai-web:
|
|
volumes:
|
|
- runtime-assets:/app/models:ro
|
|
depends_on:
|
|
runtime-assets-init:
|
|
condition: service_completed_successfully
|
|
|
|
runtime-assets-init:
|
|
build:
|
|
context: .
|
|
dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
|
|
volumes:
|
|
- runtime-assets:/data/models
|
|
|
|
volumes:
|
|
runtime-assets:
|
|
```
|
|
|
|
### Option C: Air-gap tarball
|
|
|
|
```bash
|
|
./devops/runtime-assets/acquire.sh --package
|
|
# Produces: out/runtime-assets/stella-ops-runtime-assets-<date>.tar.gz
|
|
# Transfer to air-gapped host, then:
|
|
tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/
|
|
```
|
|
|
|
---
|
|
|
|
## Checklist: before you ship a release
|
|
|
|
- [ ] `models/all-MiniLM-L6-v2.onnx` contains real weights (not the 120-byte placeholder)
|
|
- [ ] `acquire.sh --verify` passes all checksums
|
|
- [ ] Certificates are production-issued (not `*-dev.*`)
|
|
- [ ] Evidence storage volume is provisioned with adequate capacity
|
|
- [ ] Regional crypto profile is selected if applicable
|
|
- [ ] Offline Kit includes runtime assets tarball if deploying to air-gap
|
|
- [ ] `NOTICE.md` and `third-party-licenses/` are included in the image
|
|
|
|
---
|
|
|
|
## Related documentation
|
|
|
|
- Installation guide: `docs/INSTALL_GUIDE.md`
|
|
- Offline Update Kit: `docs/OFFLINE_KIT.md`
|
|
- Security hardening: `docs/SECURITY_HARDENING_GUIDE.md`
|
|
- Ghidra deployment: `docs/modules/binary-index/ghidra-deployment.md`
|
|
- LLM model bundles (separate from ONNX): `docs/modules/advisory-ai/guides/offline-model-bundles.md`
|
|
- Third-party dependencies: `docs/legal/THIRD-PARTY-DEPENDENCIES.md`
|
|
- Compose profiles: `devops/compose/README.md`
|