enrich the setup. setup fixes. minimize the consolidation plan
This commit is contained in:
392
devops/runtime-assets/README.md
Normal file
392
devops/runtime-assets/README.md
Normal file
@@ -0,0 +1,392 @@
|
||||
# Runtime Data Assets
|
||||
|
||||
Runtime data assets are files that Stella Ops services need at runtime but that
|
||||
are **not produced by `dotnet publish`** or the Angular build. They must be
|
||||
provisioned separately — either baked into Docker images, mounted as volumes, or
|
||||
supplied via an init container.
|
||||
|
||||
This directory contains the canonical inventory, acquisition scripts, and
|
||||
packaging tools for all such assets.
|
||||
|
||||
**If you are setting up Stella Ops for the first time**, read this document
|
||||
before running `docker compose up`. Services will start without these assets but
|
||||
will operate in degraded mode (no semantic search, no binary analysis, dev-only
|
||||
certificates).
|
||||
|
||||
---
|
||||
|
||||
## Quick reference
|
||||
|
||||
| Category | Required? | Size | Provisioned by |
|
||||
|---|---|---|---|
|
||||
| [ML model weights](#1-ml-model-weights) | Yes (for semantic search) | ~80 MB | `acquire.sh` |
|
||||
| [JDK + Ghidra](#2-jdk--ghidra) | Optional (binary analysis) | ~1.6 GB | `acquire.sh` |
|
||||
| [Search seed snapshots](#3-search-seed-snapshots) | Yes (first boot) | ~7 KB | Included in source |
|
||||
| [Translations (i18n)](#4-translations-i18n) | Yes | ~500 KB | Baked into Angular dist |
|
||||
| [Certificates and trust stores](#5-certificates-and-trust-stores) | Yes | ~50 KB | `etc/` + volume mounts |
|
||||
| [Regional crypto configuration](#6-regional-crypto-configuration) | Per region | ~20 KB | Compose overlays |
|
||||
| [Evidence storage](#7-evidence-storage) | Yes | Grows | Persistent named volume |
|
||||
| [Vulnerability feeds](#8-vulnerability-feeds) | Yes (offline) | ~300 MB | Offline Kit (`docs/OFFLINE_KIT.md`) |
|
||||
|
||||
---
|
||||
|
||||
## 1. ML model weights
|
||||
|
||||
**What:** The `all-MiniLM-L6-v2` sentence-transformer model in ONNX format,
|
||||
used by `OnnxVectorEncoder` for semantic vector search in AdvisoryAI.
|
||||
|
||||
**License:** Apache-2.0 (compatible with BUSL-1.1; see `third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt`).
|
||||
|
||||
**Where it goes:**
|
||||
|
||||
```
|
||||
<app-root>/models/all-MiniLM-L6-v2.onnx
|
||||
```
|
||||
|
||||
Configurable via `KnowledgeSearch__OnnxModelPath` environment variable.
|
||||
|
||||
**How to acquire:**
|
||||
|
||||
```bash
|
||||
# Option A: use the acquisition script (recommended)
|
||||
./devops/runtime-assets/acquire.sh --models
|
||||
|
||||
# Option B: manual download
|
||||
mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models
|
||||
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
|
||||
-o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
|
||||
```
|
||||
|
||||
**Verification:**
|
||||
|
||||
```bash
|
||||
sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
|
||||
# Expected: see manifest.yaml for pinned digest
|
||||
```
|
||||
|
||||
**Degraded mode:** If the model file is missing or is a placeholder, the encoder
|
||||
falls back to a deterministic character-ngram projection. Search works but
|
||||
semantic quality is significantly reduced.
|
||||
|
||||
**Docker / Compose mount:**
|
||||
|
||||
```yaml
|
||||
services:
|
||||
advisory-ai-web:
|
||||
volumes:
|
||||
- ml-models:/app/models:ro
|
||||
|
||||
volumes:
|
||||
ml-models:
|
||||
driver: local
|
||||
```
|
||||
|
||||
**Air-gap:** Include the `.onnx` file in the Offline Kit under
|
||||
`models/all-MiniLM-L6-v2.onnx`. The `acquire.sh --package` command produces a
|
||||
verified tarball for sneakernet transfer.
|
||||
|
||||
---
|
||||
|
||||
## 2. JDK + Ghidra
|
||||
|
||||
**What:** OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary
|
||||
analysis (decompilation, BSim similarity, call-graph extraction).
|
||||
|
||||
**License:** OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra —
|
||||
Apache-2.0 (NSA release).
|
||||
|
||||
**Required only when:** `GhidraOptions__Enabled=true` (default). Set to `false`
|
||||
to skip entirely if binary analysis is not needed.
|
||||
|
||||
**Where it goes:**
|
||||
|
||||
```
|
||||
/opt/java/openjdk/ # JDK installation (JAVA_HOME)
|
||||
/opt/ghidra/ # Ghidra installation (GhidraOptions__GhidraHome)
|
||||
/tmp/stellaops-ghidra/ # Workspace (GhidraOptions__WorkDir) — writable
|
||||
```
|
||||
|
||||
**How to acquire:**
|
||||
|
||||
```bash
|
||||
# Option A: use the acquisition script
|
||||
./devops/runtime-assets/acquire.sh --ghidra
|
||||
|
||||
# Option B: manual
|
||||
# JDK (Eclipse Temurin 17)
|
||||
curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \
|
||||
| tar -xz -C /opt/java/
|
||||
|
||||
# Ghidra 11.2
|
||||
curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \
|
||||
-o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/
|
||||
```
|
||||
|
||||
**Docker:** For services that need Ghidra, use a dedicated Dockerfile stage or a
|
||||
sidecar data image. See `docs/modules/binary-index/ghidra-deployment.md`.
|
||||
|
||||
**Air-gap:** Pre-download both archives on a connected machine and include them
|
||||
in the Offline Kit under `tools/jdk/` and `tools/ghidra/`.
|
||||
|
||||
---
|
||||
|
||||
## 3. Search seed snapshots
|
||||
|
||||
**What:** Small JSON files that bootstrap the unified search index on first
|
||||
start. Without them, search returns empty results until live data adapters
|
||||
populate the index.
|
||||
|
||||
**Where they are:**
|
||||
|
||||
```
|
||||
src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/
|
||||
findings.snapshot.json (1.3 KB)
|
||||
vex.snapshot.json (1.2 KB)
|
||||
policy.snapshot.json (1.2 KB)
|
||||
graph.snapshot.json (758 B)
|
||||
scanner.snapshot.json (751 B)
|
||||
opsmemory.snapshot.json (1.1 KB)
|
||||
timeline.snapshot.json (824 B)
|
||||
```
|
||||
|
||||
**How they get into the image:** The `.csproj` copies them to the output
|
||||
directory via `<Content>` items. They are included in `dotnet publish` output
|
||||
automatically.
|
||||
|
||||
**Runtime behavior:** `UnifiedSearchIndexer` loads them at startup and refreshes
|
||||
from live data adapters every 300 seconds (`UnifiedSearch__AutoRefreshIntervalSeconds`).
|
||||
|
||||
**No separate provisioning needed** unless you want to supply custom seed data,
|
||||
in which case mount a volume at the snapshot path and set:
|
||||
|
||||
```
|
||||
KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Translations (i18n)
|
||||
|
||||
**What:** JSON translation bundles for the Angular frontend, supporting 9
|
||||
locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW.
|
||||
|
||||
**Where they are:**
|
||||
|
||||
```
|
||||
src/Web/StellaOps.Web/src/i18n/*.common.json
|
||||
```
|
||||
|
||||
**How they get into the image:** Compiled into the Angular `dist/` bundle during
|
||||
`npm run build`. The console Docker image (`devops/docker/Dockerfile.console`)
|
||||
includes them automatically.
|
||||
|
||||
**Runtime overrides:** The backend `TranslationRegistry` supports
|
||||
database-backed translation overrides (priority 100) over file-based bundles
|
||||
(priority 10). For custom translations in offline environments, seed the
|
||||
database or mount override JSON files.
|
||||
|
||||
**No separate provisioning needed** for standard deployments.
|
||||
|
||||
---
|
||||
|
||||
## 5. Certificates and trust stores
|
||||
|
||||
**What:** TLS certificates, signing keys, and CA trust bundles for inter-service
|
||||
communication and attestation verification.
|
||||
|
||||
**Development defaults (not for production):**
|
||||
|
||||
```
|
||||
etc/authority/keys/
|
||||
kestrel-dev.pfx # Kestrel TLS (password: devpass)
|
||||
kestrel-dev.crt / .key
|
||||
ack-token-dev.pem # Token signing key
|
||||
signing-dev.pem # Service signing key
|
||||
|
||||
etc/trust-profiles/assets/
|
||||
ca.crt # Root CA bundle
|
||||
rekor-public.pem # Rekor transparency log public key
|
||||
```
|
||||
|
||||
**Compose mounts (already configured):**
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
- ../../etc/authority/keys:/app/etc/certs:ro
|
||||
- ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro
|
||||
```
|
||||
|
||||
**Production:** Replace dev certificates with properly issued certificates.
|
||||
Mount as read-only volumes. See `docs/SECURITY_HARDENING_GUIDE.md`.
|
||||
|
||||
**Air-gap:** Include the full trust chain in the Offline Kit. For Russian
|
||||
deployments, include `certificates/russian_trusted_bundle.pem` (see
|
||||
`docs/OFFLINE_KIT.md`).
|
||||
|
||||
---
|
||||
|
||||
## 6. Regional crypto configuration
|
||||
|
||||
**What:** YAML configuration files that select the cryptographic profile
|
||||
(algorithms, key types, HSM settings) per deployment region.
|
||||
|
||||
**Files:**
|
||||
|
||||
```
|
||||
etc/appsettings.crypto.international.yaml # Default (ECDSA/RSA/EdDSA)
|
||||
etc/appsettings.crypto.eu.yaml # eIDAS qualified signatures
|
||||
etc/appsettings.crypto.russia.yaml # GOST R 34.10/34.11
|
||||
etc/appsettings.crypto.china.yaml # SM2/SM3/SM4
|
||||
etc/crypto-plugins-manifest.json # Plugin registry
|
||||
```
|
||||
|
||||
**Selection:** Via Docker Compose overlays:
|
||||
|
||||
```bash
|
||||
# EU deployment
|
||||
docker compose -f docker-compose.stella-ops.yml \
|
||||
-f docker-compose.compliance-eu.yml up -d
|
||||
```
|
||||
|
||||
**No separate provisioning needed** — files ship in the source tree and are
|
||||
selected by compose overlay. See `devops/compose/README.md` for details.
|
||||
|
||||
---
|
||||
|
||||
## 7. Evidence storage
|
||||
|
||||
**What:** Persistent storage for evidence bundles (SBOMs, attestations,
|
||||
signatures, scan proofs). Grows with usage.
|
||||
|
||||
**Default path:** `/data/evidence` (named volume `evidence-data`).
|
||||
|
||||
**Configured via:** `EvidenceLocker__ObjectStore__FileSystem__RootPath`
|
||||
|
||||
**Compose (already configured):**
|
||||
|
||||
```yaml
|
||||
volumes:
|
||||
evidence-data:
|
||||
driver: local
|
||||
```
|
||||
|
||||
**Sizing:** Plan ~1 GB per 1000 scans as a rough baseline. Monitor with
|
||||
Prometheus metric `evidence_locker_storage_bytes_total`.
|
||||
|
||||
**Backup:** Include in PostgreSQL backup strategy. Evidence files are
|
||||
content-addressed and immutable — append-only, safe to rsync.
|
||||
|
||||
---
|
||||
|
||||
## 8. Vulnerability feeds
|
||||
|
||||
**What:** Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds).
|
||||
Required for offline vulnerability matching.
|
||||
|
||||
**Provisioned by:** The Offline Update Kit (`docs/OFFLINE_KIT.md`). This is a
|
||||
separate, well-documented workflow. See that document for full details.
|
||||
|
||||
**Not covered by `acquire.sh`** — feed management is handled by the Concelier
|
||||
module and the Offline Kit import pipeline.
|
||||
|
||||
---
|
||||
|
||||
## Acquisition script
|
||||
|
||||
The `acquire.sh` script automates downloading, verifying, and staging runtime
|
||||
data assets. It is idempotent — safe to run multiple times.
|
||||
|
||||
```bash
|
||||
# Acquire everything (models + Ghidra + JDK)
|
||||
./devops/runtime-assets/acquire.sh --all
|
||||
|
||||
# Models only (for environments without binary analysis)
|
||||
./devops/runtime-assets/acquire.sh --models
|
||||
|
||||
# Ghidra + JDK only
|
||||
./devops/runtime-assets/acquire.sh --ghidra
|
||||
|
||||
# Package all acquired assets into a portable tarball for air-gap transfer
|
||||
./devops/runtime-assets/acquire.sh --package
|
||||
|
||||
# Verify already-acquired assets against pinned checksums
|
||||
./devops/runtime-assets/acquire.sh --verify
|
||||
```
|
||||
|
||||
Asset checksums are pinned in `manifest.yaml` in this directory. The script
|
||||
verifies SHA-256 digests after every download and refuses corrupted files.
|
||||
|
||||
---
|
||||
|
||||
## Docker integration
|
||||
|
||||
### Option A: Bake into image (simplest)
|
||||
|
||||
Run `acquire.sh --models` before `docker build`. The `.csproj` copies
|
||||
`models/all-MiniLM-L6-v2.onnx` into the publish output automatically.
|
||||
|
||||
### Option B: Shared data volume (recommended for production)
|
||||
|
||||
Build a lightweight data image or use an init container:
|
||||
|
||||
```dockerfile
|
||||
# Dockerfile.runtime-assets
|
||||
FROM busybox:1.37
|
||||
COPY models/ /data/models/
|
||||
VOLUME /data/models
|
||||
```
|
||||
|
||||
Mount in compose:
|
||||
|
||||
```yaml
|
||||
services:
|
||||
advisory-ai-web:
|
||||
volumes:
|
||||
- runtime-assets:/app/models:ro
|
||||
depends_on:
|
||||
runtime-assets-init:
|
||||
condition: service_completed_successfully
|
||||
|
||||
runtime-assets-init:
|
||||
build:
|
||||
context: .
|
||||
dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
|
||||
volumes:
|
||||
- runtime-assets:/data/models
|
||||
|
||||
volumes:
|
||||
runtime-assets:
|
||||
```
|
||||
|
||||
### Option C: Air-gap tarball
|
||||
|
||||
```bash
|
||||
./devops/runtime-assets/acquire.sh --package
|
||||
# Produces: out/runtime-assets/stella-ops-runtime-assets-<date>.tar.gz
|
||||
# Transfer to air-gapped host, then:
|
||||
tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Checklist: before you ship a release
|
||||
|
||||
- [ ] `models/all-MiniLM-L6-v2.onnx` contains real weights (not the 120-byte placeholder)
|
||||
- [ ] `acquire.sh --verify` passes all checksums
|
||||
- [ ] Certificates are production-issued (not `*-dev.*`)
|
||||
- [ ] Evidence storage volume is provisioned with adequate capacity
|
||||
- [ ] Regional crypto profile is selected if applicable
|
||||
- [ ] Offline Kit includes runtime assets tarball if deploying to air-gap
|
||||
- [ ] `NOTICE.md` and `third-party-licenses/` are included in the image
|
||||
|
||||
---
|
||||
|
||||
## Related documentation
|
||||
|
||||
- Installation guide: `docs/INSTALL_GUIDE.md`
|
||||
- Offline Update Kit: `docs/OFFLINE_KIT.md`
|
||||
- Security hardening: `docs/SECURITY_HARDENING_GUIDE.md`
|
||||
- Ghidra deployment: `docs/modules/binary-index/ghidra-deployment.md`
|
||||
- LLM model bundles (separate from ONNX): `docs/modules/advisory-ai/guides/offline-model-bundles.md`
|
||||
- Third-party dependencies: `docs/legal/THIRD-PARTY-DEPENDENCIES.md`
|
||||
- Compose profiles: `devops/compose/README.md`
|
||||
Reference in New Issue
Block a user