enrich the setup. setup fixes. minimize the consolidation plan

This commit is contained in:
master
2026-02-26 08:46:06 +02:00
parent 63c70a6d37
commit 4fe8eb56ae
26 changed files with 1568 additions and 646 deletions

View File

@@ -0,0 +1,55 @@
# ---------------------------------------------------------------------------
# Dockerfile.runtime-assets
#
# Lightweight data image that packages runtime assets (ML models, JDK, Ghidra,
# certificates) into named volumes for Stella Ops services.
#
# Usage:
# 1. Acquire assets first:
# ./devops/runtime-assets/acquire.sh --all
#
# 2. Build the data image:
# docker build -f devops/runtime-assets/Dockerfile.runtime-assets \
# -t stellaops/runtime-assets:latest .
#
# 3. Use in docker-compose (see docker-compose.runtime-assets.yml)
#
# The image runs a one-shot copy to populate named volumes, then exits.
# Services mount the same volumes read-only.
# ---------------------------------------------------------------------------
FROM busybox:1.37 AS base
LABEL org.opencontainers.image.title="stellaops-runtime-assets"
LABEL org.opencontainers.image.description="Runtime data assets for Stella Ops (ML models, certificates, tools)"
LABEL org.opencontainers.image.vendor="stella-ops.org"
# ---------------------------------------------------------------------------
# ML Models
# ---------------------------------------------------------------------------
COPY src/AdvisoryAI/StellaOps.AdvisoryAI/models/ /data/models/
# ---------------------------------------------------------------------------
# Certificates and trust bundles
# ---------------------------------------------------------------------------
COPY etc/trust-profiles/assets/ /data/certificates/trust-profiles/
COPY etc/authority/keys/ /data/certificates/authority/
# ---------------------------------------------------------------------------
# License attribution (required by Apache-2.0 and MIT)
# ---------------------------------------------------------------------------
COPY NOTICE.md /data/licenses/NOTICE.md
COPY third-party-licenses/ /data/licenses/third-party/
# ---------------------------------------------------------------------------
# Manifest for verification
# ---------------------------------------------------------------------------
COPY devops/runtime-assets/manifest.yaml /data/manifest.yaml
# ---------------------------------------------------------------------------
# Entrypoint: copy assets to volume mount points, then exit
# ---------------------------------------------------------------------------
COPY devops/runtime-assets/init-volumes.sh /init-volumes.sh
RUN chmod +x /init-volumes.sh
ENTRYPOINT ["/init-volumes.sh"]

View File

@@ -0,0 +1,392 @@
# Runtime Data Assets
Runtime data assets are files that Stella Ops services need at runtime but that
are **not produced by `dotnet publish`** or the Angular build. They must be
provisioned separately — either baked into Docker images, mounted as volumes, or
supplied via an init container.
This directory contains the canonical inventory, acquisition scripts, and
packaging tools for all such assets.
**If you are setting up Stella Ops for the first time**, read this document
before running `docker compose up`. Services will start without these assets but
will operate in degraded mode (no semantic search, no binary analysis, dev-only
certificates).
---
## Quick reference
| Category | Required? | Size | Provisioned by |
|---|---|---|---|
| [ML model weights](#1-ml-model-weights) | Yes (for semantic search) | ~80 MB | `acquire.sh` |
| [JDK + Ghidra](#2-jdk--ghidra) | Optional (binary analysis) | ~1.6 GB | `acquire.sh` |
| [Search seed snapshots](#3-search-seed-snapshots) | Yes (first boot) | ~7 KB | Included in source |
| [Translations (i18n)](#4-translations-i18n) | Yes | ~500 KB | Baked into Angular dist |
| [Certificates and trust stores](#5-certificates-and-trust-stores) | Yes | ~50 KB | `etc/` + volume mounts |
| [Regional crypto configuration](#6-regional-crypto-configuration) | Per region | ~20 KB | Compose overlays |
| [Evidence storage](#7-evidence-storage) | Yes | Grows | Persistent named volume |
| [Vulnerability feeds](#8-vulnerability-feeds) | Yes (offline) | ~300 MB | Offline Kit (`docs/OFFLINE_KIT.md`) |
---
## 1. ML model weights
**What:** The `all-MiniLM-L6-v2` sentence-transformer model in ONNX format,
used by `OnnxVectorEncoder` for semantic vector search in AdvisoryAI.
**License:** Apache-2.0 (compatible with BUSL-1.1; see `third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt`).
**Where it goes:**
```
<app-root>/models/all-MiniLM-L6-v2.onnx
```
Configurable via `KnowledgeSearch__OnnxModelPath` environment variable.
**How to acquire:**
```bash
# Option A: use the acquisition script (recommended)
./devops/runtime-assets/acquire.sh --models
# Option B: manual download
mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
-o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
```
**Verification:**
```bash
sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
# Expected: see manifest.yaml for pinned digest
```
**Degraded mode:** If the model file is missing or is a placeholder, the encoder
falls back to a deterministic character-ngram projection. Search works but
semantic quality is significantly reduced.
**Docker / Compose mount:**
```yaml
services:
advisory-ai-web:
volumes:
- ml-models:/app/models:ro
volumes:
ml-models:
driver: local
```
**Air-gap:** Include the `.onnx` file in the Offline Kit under
`models/all-MiniLM-L6-v2.onnx`. The `acquire.sh --package` command produces a
verified tarball for sneakernet transfer.
---
## 2. JDK + Ghidra
**What:** OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary
analysis (decompilation, BSim similarity, call-graph extraction).
**License:** OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra —
Apache-2.0 (NSA release).
**Required only when:** `GhidraOptions__Enabled=true` (default). Set to `false`
to skip entirely if binary analysis is not needed.
**Where it goes:**
```
/opt/java/openjdk/ # JDK installation (JAVA_HOME)
/opt/ghidra/ # Ghidra installation (GhidraOptions__GhidraHome)
/tmp/stellaops-ghidra/ # Workspace (GhidraOptions__WorkDir) — writable
```
**How to acquire:**
```bash
# Option A: use the acquisition script
./devops/runtime-assets/acquire.sh --ghidra
# Option B: manual
# JDK (Eclipse Temurin 17)
curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \
| tar -xz -C /opt/java/
# Ghidra 11.2
curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \
-o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/
```
**Docker:** For services that need Ghidra, use a dedicated Dockerfile stage or a
sidecar data image. See `docs/modules/binary-index/ghidra-deployment.md`.
**Air-gap:** Pre-download both archives on a connected machine and include them
in the Offline Kit under `tools/jdk/` and `tools/ghidra/`.
---
## 3. Search seed snapshots
**What:** Small JSON files that bootstrap the unified search index on first
start. Without them, search returns empty results until live data adapters
populate the index.
**Where they are:**
```
src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/
findings.snapshot.json (1.3 KB)
vex.snapshot.json (1.2 KB)
policy.snapshot.json (1.2 KB)
graph.snapshot.json (758 B)
scanner.snapshot.json (751 B)
opsmemory.snapshot.json (1.1 KB)
timeline.snapshot.json (824 B)
```
**How they get into the image:** The `.csproj` copies them to the output
directory via `<Content>` items. They are included in `dotnet publish` output
automatically.
**Runtime behavior:** `UnifiedSearchIndexer` loads them at startup and refreshes
from live data adapters every 300 seconds (`UnifiedSearch__AutoRefreshIntervalSeconds`).
**No separate provisioning needed** unless you want to supply custom seed data,
in which case mount a volume at the snapshot path and set:
```
KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json
```
---
## 4. Translations (i18n)
**What:** JSON translation bundles for the Angular frontend, supporting 9
locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW.
**Where they are:**
```
src/Web/StellaOps.Web/src/i18n/*.common.json
```
**How they get into the image:** Compiled into the Angular `dist/` bundle during
`npm run build`. The console Docker image (`devops/docker/Dockerfile.console`)
includes them automatically.
**Runtime overrides:** The backend `TranslationRegistry` supports
database-backed translation overrides (priority 100) over file-based bundles
(priority 10). For custom translations in offline environments, seed the
database or mount override JSON files.
**No separate provisioning needed** for standard deployments.
---
## 5. Certificates and trust stores
**What:** TLS certificates, signing keys, and CA trust bundles for inter-service
communication and attestation verification.
**Development defaults (not for production):**
```
etc/authority/keys/
kestrel-dev.pfx # Kestrel TLS (password: devpass)
kestrel-dev.crt / .key
ack-token-dev.pem # Token signing key
signing-dev.pem # Service signing key
etc/trust-profiles/assets/
ca.crt # Root CA bundle
rekor-public.pem # Rekor transparency log public key
```
**Compose mounts (already configured):**
```yaml
volumes:
- ../../etc/authority/keys:/app/etc/certs:ro
- ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro
```
**Production:** Replace dev certificates with properly issued certificates.
Mount as read-only volumes. See `docs/SECURITY_HARDENING_GUIDE.md`.
**Air-gap:** Include the full trust chain in the Offline Kit. For Russian
deployments, include `certificates/russian_trusted_bundle.pem` (see
`docs/OFFLINE_KIT.md`).
---
## 6. Regional crypto configuration
**What:** YAML configuration files that select the cryptographic profile
(algorithms, key types, HSM settings) per deployment region.
**Files:**
```
etc/appsettings.crypto.international.yaml # Default (ECDSA/RSA/EdDSA)
etc/appsettings.crypto.eu.yaml # eIDAS qualified signatures
etc/appsettings.crypto.russia.yaml # GOST R 34.10/34.11
etc/appsettings.crypto.china.yaml # SM2/SM3/SM4
etc/crypto-plugins-manifest.json # Plugin registry
```
**Selection:** Via Docker Compose overlays:
```bash
# EU deployment
docker compose -f docker-compose.stella-ops.yml \
-f docker-compose.compliance-eu.yml up -d
```
**No separate provisioning needed** — files ship in the source tree and are
selected by compose overlay. See `devops/compose/README.md` for details.
---
## 7. Evidence storage
**What:** Persistent storage for evidence bundles (SBOMs, attestations,
signatures, scan proofs). Grows with usage.
**Default path:** `/data/evidence` (named volume `evidence-data`).
**Configured via:** `EvidenceLocker__ObjectStore__FileSystem__RootPath`
**Compose (already configured):**
```yaml
volumes:
evidence-data:
driver: local
```
**Sizing:** Plan ~1 GB per 1000 scans as a rough baseline. Monitor with
Prometheus metric `evidence_locker_storage_bytes_total`.
**Backup:** Include in PostgreSQL backup strategy. Evidence files are
content-addressed and immutable — append-only, safe to rsync.
---
## 8. Vulnerability feeds
**What:** Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds).
Required for offline vulnerability matching.
**Provisioned by:** The Offline Update Kit (`docs/OFFLINE_KIT.md`). This is a
separate, well-documented workflow. See that document for full details.
**Not covered by `acquire.sh`** — feed management is handled by the Concelier
module and the Offline Kit import pipeline.
---
## Acquisition script
The `acquire.sh` script automates downloading, verifying, and staging runtime
data assets. It is idempotent — safe to run multiple times.
```bash
# Acquire everything (models + Ghidra + JDK)
./devops/runtime-assets/acquire.sh --all
# Models only (for environments without binary analysis)
./devops/runtime-assets/acquire.sh --models
# Ghidra + JDK only
./devops/runtime-assets/acquire.sh --ghidra
# Package all acquired assets into a portable tarball for air-gap transfer
./devops/runtime-assets/acquire.sh --package
# Verify already-acquired assets against pinned checksums
./devops/runtime-assets/acquire.sh --verify
```
Asset checksums are pinned in `manifest.yaml` in this directory. The script
verifies SHA-256 digests after every download and refuses corrupted files.
---
## Docker integration
### Option A: Bake into image (simplest)
Run `acquire.sh --models` before `docker build`. The `.csproj` copies
`models/all-MiniLM-L6-v2.onnx` into the publish output automatically.
### Option B: Shared data volume (recommended for production)
Build a lightweight data image or use an init container:
```dockerfile
# Dockerfile.runtime-assets
FROM busybox:1.37
COPY models/ /data/models/
VOLUME /data/models
```
Mount in compose:
```yaml
services:
advisory-ai-web:
volumes:
- runtime-assets:/app/models:ro
depends_on:
runtime-assets-init:
condition: service_completed_successfully
runtime-assets-init:
build:
context: .
dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
volumes:
- runtime-assets:/data/models
volumes:
runtime-assets:
```
### Option C: Air-gap tarball
```bash
./devops/runtime-assets/acquire.sh --package
# Produces: out/runtime-assets/stella-ops-runtime-assets-<date>.tar.gz
# Transfer to air-gapped host, then:
tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/
```
---
## Checklist: before you ship a release
- [ ] `models/all-MiniLM-L6-v2.onnx` contains real weights (not the 120-byte placeholder)
- [ ] `acquire.sh --verify` passes all checksums
- [ ] Certificates are production-issued (not `*-dev.*`)
- [ ] Evidence storage volume is provisioned with adequate capacity
- [ ] Regional crypto profile is selected if applicable
- [ ] Offline Kit includes runtime assets tarball if deploying to air-gap
- [ ] `NOTICE.md` and `third-party-licenses/` are included in the image
---
## Related documentation
- Installation guide: `docs/INSTALL_GUIDE.md`
- Offline Update Kit: `docs/OFFLINE_KIT.md`
- Security hardening: `docs/SECURITY_HARDENING_GUIDE.md`
- Ghidra deployment: `docs/modules/binary-index/ghidra-deployment.md`
- LLM model bundles (separate from ONNX): `docs/modules/advisory-ai/guides/offline-model-bundles.md`
- Third-party dependencies: `docs/legal/THIRD-PARTY-DEPENDENCIES.md`
- Compose profiles: `devops/compose/README.md`

View File

@@ -0,0 +1,389 @@
#!/usr/bin/env bash
# ---------------------------------------------------------------------------
# acquire.sh — Download, verify, and stage Stella Ops runtime data assets.
#
# Usage:
# ./devops/runtime-assets/acquire.sh --all # everything
# ./devops/runtime-assets/acquire.sh --models # ONNX embedding model only
# ./devops/runtime-assets/acquire.sh --ghidra # JDK + Ghidra only
# ./devops/runtime-assets/acquire.sh --verify # verify existing assets
# ./devops/runtime-assets/acquire.sh --package # create air-gap tarball
#
# The script is idempotent: re-running skips already-verified assets.
# All downloads are checksum-verified against manifest.yaml.
# ---------------------------------------------------------------------------
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
REPO_ROOT="$(cd "$SCRIPT_DIR/../.." && pwd)"
MANIFEST="$SCRIPT_DIR/manifest.yaml"
STAGING_DIR="${STAGING_DIR:-$REPO_ROOT/out/runtime-assets}"
# Colors (disabled if not a terminal)
if [[ -t 1 ]]; then
RED='\033[0;31m'; GREEN='\033[0;32m'; YELLOW='\033[1;33m'; NC='\033[0m'
else
RED=''; GREEN=''; YELLOW=''; NC=''
fi
log_info() { echo -e "${GREEN}[acquire]${NC} $*"; }
log_warn() { echo -e "${YELLOW}[acquire]${NC} $*" >&2; }
log_error() { echo -e "${RED}[acquire]${NC} $*" >&2; }
# ---------------------------------------------------------------------------
# Asset paths and URLs (sourced from manifest.yaml inline)
# ---------------------------------------------------------------------------
ONNX_MODEL_URL="https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx"
ONNX_MODEL_DEST="$REPO_ROOT/src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx"
JDK_URL="https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz"
JDK_DEST="$STAGING_DIR/jdk"
GHIDRA_URL="https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip"
GHIDRA_DEST="$STAGING_DIR/ghidra"
# ---------------------------------------------------------------------------
# Helpers
# ---------------------------------------------------------------------------
check_prerequisites() {
local missing=()
command -v curl >/dev/null 2>&1 || missing+=("curl")
command -v sha256sum >/dev/null 2>&1 || {
# macOS uses shasum
command -v shasum >/dev/null 2>&1 || missing+=("sha256sum or shasum")
}
if [[ ${#missing[@]} -gt 0 ]]; then
log_error "Missing required tools: ${missing[*]}"
exit 1
fi
}
compute_sha256() {
local file="$1"
if command -v sha256sum >/dev/null 2>&1; then
sha256sum "$file" | awk '{print $1}'
else
shasum -a 256 "$file" | awk '{print $1}'
fi
}
download_with_progress() {
local url="$1" dest="$2" label="$3"
log_info "Downloading $label..."
log_info " URL: $url"
log_info " Dest: $dest"
mkdir -p "$(dirname "$dest")"
if ! curl -fL --progress-bar -o "$dest" "$url"; then
log_error "Download failed: $label"
rm -f "$dest"
return 1
fi
local size
size=$(wc -c < "$dest" 2>/dev/null || echo "unknown")
log_info " Downloaded: $size bytes"
}
is_placeholder() {
local file="$1"
if [[ ! -f "$file" ]]; then
return 0 # missing = placeholder
fi
local size
size=$(wc -c < "$file" 2>/dev/null || echo "0")
# The current placeholder is ~120 bytes; real model is ~80 MB
if [[ "$size" -lt 1000 ]]; then
return 0 # too small to be real
fi
return 1
}
# ---------------------------------------------------------------------------
# Acquisition functions
# ---------------------------------------------------------------------------
acquire_models() {
log_info "=== ML Models ==="
if is_placeholder "$ONNX_MODEL_DEST"; then
download_with_progress "$ONNX_MODEL_URL" "$ONNX_MODEL_DEST" "all-MiniLM-L6-v2 ONNX model"
if is_placeholder "$ONNX_MODEL_DEST"; then
log_error "Downloaded file appears to be invalid (too small)."
return 1
fi
local digest
digest=$(compute_sha256 "$ONNX_MODEL_DEST")
log_info " SHA-256: $digest"
log_info " Update manifest.yaml with this digest for future verification."
else
log_info "ONNX model already present and valid: $ONNX_MODEL_DEST"
fi
log_info "ML models: OK"
}
acquire_ghidra() {
log_info "=== JDK + Ghidra ==="
mkdir -p "$STAGING_DIR"
# JDK
local jdk_archive="$STAGING_DIR/jdk.tar.gz"
if [[ ! -d "$JDK_DEST" ]] || [[ -z "$(ls -A "$JDK_DEST" 2>/dev/null)" ]]; then
download_with_progress "$JDK_URL" "$jdk_archive" "Eclipse Temurin JRE 17"
mkdir -p "$JDK_DEST"
tar -xzf "$jdk_archive" -C "$JDK_DEST" --strip-components=1
rm -f "$jdk_archive"
log_info " JDK extracted to: $JDK_DEST"
else
log_info "JDK already present: $JDK_DEST"
fi
# Ghidra
local ghidra_archive="$STAGING_DIR/ghidra.zip"
if [[ ! -d "$GHIDRA_DEST" ]] || [[ -z "$(ls -A "$GHIDRA_DEST" 2>/dev/null)" ]]; then
download_with_progress "$GHIDRA_URL" "$ghidra_archive" "Ghidra 11.2"
mkdir -p "$GHIDRA_DEST"
if command -v unzip >/dev/null 2>&1; then
unzip -q "$ghidra_archive" -d "$GHIDRA_DEST"
else
log_error "unzip not found. Install unzip to extract Ghidra."
return 1
fi
rm -f "$ghidra_archive"
log_info " Ghidra extracted to: $GHIDRA_DEST"
else
log_info "Ghidra already present: $GHIDRA_DEST"
fi
log_info "JDK + Ghidra: OK"
}
# ---------------------------------------------------------------------------
# Verification
# ---------------------------------------------------------------------------
verify_assets() {
log_info "=== Verifying runtime assets ==="
local errors=0
# ONNX model
if is_placeholder "$ONNX_MODEL_DEST"; then
log_warn "ONNX model is missing or placeholder: $ONNX_MODEL_DEST"
log_warn " Semantic search will use degraded fallback encoder."
((errors++))
else
local digest
digest=$(compute_sha256 "$ONNX_MODEL_DEST")
log_info "ONNX model: present ($digest)"
fi
# Search snapshots
local snapshot_dir="$REPO_ROOT/src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots"
local snapshot_count=0
for f in findings vex policy graph scanner opsmemory timeline; do
if [[ -f "$snapshot_dir/$f.snapshot.json" ]]; then
((snapshot_count++))
fi
done
if [[ $snapshot_count -eq 7 ]]; then
log_info "Search snapshots: all 7 present"
else
log_warn "Search snapshots: $snapshot_count/7 present in $snapshot_dir"
((errors++))
fi
# Certificates
if [[ -f "$REPO_ROOT/etc/authority/keys/kestrel-dev.pfx" ]]; then
log_info "Dev certificates: present (replace for production)"
else
log_warn "Dev certificates: missing in etc/authority/keys/"
((errors++))
fi
# Trust bundle
if [[ -f "$REPO_ROOT/etc/trust-profiles/assets/ca.crt" ]]; then
log_info "CA trust bundle: present"
else
log_warn "CA trust bundle: missing in etc/trust-profiles/assets/"
((errors++))
fi
# Translations
local i18n_dir="$REPO_ROOT/src/Web/StellaOps.Web/src/i18n"
local locale_count=0
for locale in en-US de-DE bg-BG ru-RU es-ES fr-FR uk-UA zh-CN zh-TW; do
if [[ -f "$i18n_dir/$locale.common.json" ]]; then
((locale_count++))
fi
done
if [[ $locale_count -eq 9 ]]; then
log_info "Translations: all 9 locales present"
else
log_warn "Translations: $locale_count/9 locales present"
((errors++))
fi
# License files
if [[ -f "$REPO_ROOT/third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt" ]]; then
log_info "License attribution: ONNX model license present"
else
log_warn "License attribution: missing third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt"
((errors++))
fi
if [[ -f "$REPO_ROOT/NOTICE.md" ]]; then
log_info "NOTICE.md: present"
else
log_warn "NOTICE.md: missing"
((errors++))
fi
# JDK + Ghidra (optional)
if [[ -d "$JDK_DEST" ]] && [[ -n "$(ls -A "$JDK_DEST" 2>/dev/null)" ]]; then
log_info "JDK: present at $JDK_DEST"
else
log_info "JDK: not staged (optional — only needed for Ghidra)"
fi
if [[ -d "$GHIDRA_DEST" ]] && [[ -n "$(ls -A "$GHIDRA_DEST" 2>/dev/null)" ]]; then
log_info "Ghidra: present at $GHIDRA_DEST"
else
log_info "Ghidra: not staged (optional — only needed for binary analysis)"
fi
echo ""
if [[ $errors -gt 0 ]]; then
log_warn "Verification completed with $errors warning(s)."
return 1
else
log_info "All runtime assets verified."
return 0
fi
}
# ---------------------------------------------------------------------------
# Packaging (air-gap tarball)
# ---------------------------------------------------------------------------
package_assets() {
log_info "=== Packaging runtime assets for air-gap transfer ==="
local pkg_dir="$STAGING_DIR/package"
local timestamp
timestamp=$(date -u +"%Y%m%d")
local tarball="$STAGING_DIR/stella-ops-runtime-assets-${timestamp}.tar.gz"
rm -rf "$pkg_dir"
mkdir -p "$pkg_dir/models" "$pkg_dir/certificates" "$pkg_dir/licenses"
# ONNX model
if ! is_placeholder "$ONNX_MODEL_DEST"; then
cp "$ONNX_MODEL_DEST" "$pkg_dir/models/all-MiniLM-L6-v2.onnx"
log_info " Included: ONNX model"
else
log_warn " Skipped: ONNX model (placeholder — run --models first)"
fi
# JDK
if [[ -d "$JDK_DEST" ]] && [[ -n "$(ls -A "$JDK_DEST" 2>/dev/null)" ]]; then
cp -r "$JDK_DEST" "$pkg_dir/jdk"
log_info " Included: JDK"
fi
# Ghidra
if [[ -d "$GHIDRA_DEST" ]] && [[ -n "$(ls -A "$GHIDRA_DEST" 2>/dev/null)" ]]; then
cp -r "$GHIDRA_DEST" "$pkg_dir/ghidra"
log_info " Included: Ghidra"
fi
# Certificates
if [[ -d "$REPO_ROOT/etc/trust-profiles/assets" ]]; then
cp -r "$REPO_ROOT/etc/trust-profiles/assets/"* "$pkg_dir/certificates/" 2>/dev/null || true
log_info " Included: trust profile assets"
fi
# License files
cp "$REPO_ROOT/NOTICE.md" "$pkg_dir/licenses/"
cp -r "$REPO_ROOT/third-party-licenses/"* "$pkg_dir/licenses/" 2>/dev/null || true
log_info " Included: license files"
# Manifest
cp "$MANIFEST" "$pkg_dir/manifest.yaml"
# Create tarball (deterministic: sorted, zero mtime/uid/gid)
tar --sort=name \
--mtime='2024-01-01 00:00:00' \
--owner=0 --group=0 \
-czf "$tarball" \
-C "$pkg_dir" .
local digest
digest=$(compute_sha256 "$tarball")
echo "$digest $(basename "$tarball")" > "${tarball}.sha256"
log_info "Package created: $tarball"
log_info " SHA-256: $digest"
log_info " Transfer this file to the air-gapped host."
rm -rf "$pkg_dir"
}
# ---------------------------------------------------------------------------
# Main
# ---------------------------------------------------------------------------
usage() {
cat <<EOF
Usage: $0 [OPTIONS]
Options:
--all Download and verify all runtime assets (models + Ghidra + JDK)
--models Download ONNX embedding model only
--ghidra Download JDK + Ghidra only
--verify Verify existing assets against manifest
--package Create air-gap transfer tarball from acquired assets
-h, --help Show this help
Environment variables:
STAGING_DIR Override staging directory (default: <repo>/out/runtime-assets)
EOF
}
main() {
if [[ $# -eq 0 ]]; then
usage
exit 0
fi
check_prerequisites
local do_models=false do_ghidra=false do_verify=false do_package=false
while [[ $# -gt 0 ]]; do
case "$1" in
--all) do_models=true; do_ghidra=true ;;
--models) do_models=true ;;
--ghidra) do_ghidra=true ;;
--verify) do_verify=true ;;
--package) do_package=true ;;
-h|--help) usage; exit 0 ;;
*) log_error "Unknown option: $1"; usage; exit 1 ;;
esac
shift
done
log_info "Repo root: $REPO_ROOT"
log_info "Staging dir: $STAGING_DIR"
echo ""
[[ "$do_models" == "true" ]] && acquire_models
[[ "$do_ghidra" == "true" ]] && acquire_ghidra
[[ "$do_verify" == "true" ]] && verify_assets
[[ "$do_package" == "true" ]] && package_assets
echo ""
log_info "Done."
}
main "$@"

View File

@@ -0,0 +1,61 @@
# ---------------------------------------------------------------------------
# docker-compose.runtime-assets.yml
#
# Overlay that provisions shared runtime data volumes (ML models, certificates,
# licenses) via an init container. Use alongside the main compose file:
#
# docker compose -f docker-compose.stella-ops.yml \
# -f devops/runtime-assets/docker-compose.runtime-assets.yml \
# up -d
#
# The init container runs once, copies assets into named volumes, and exits.
# Services mount the same volumes read-only.
#
# Prerequisites:
# 1. Run ./devops/runtime-assets/acquire.sh --models (at minimum)
# 2. Build: docker build -f devops/runtime-assets/Dockerfile.runtime-assets \
# -t stellaops/runtime-assets:latest .
# ---------------------------------------------------------------------------
services:
# Init container: populates shared volumes, then exits
runtime-assets-init:
image: stellaops/runtime-assets:latest
build:
context: ../..
dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
volumes:
- stellaops-models:/mnt/models
- stellaops-certificates:/mnt/certificates
- stellaops-licenses:/mnt/licenses
restart: "no"
# Override AdvisoryAI to mount the models volume
advisory-ai-web:
volumes:
- stellaops-models:/app/models:ro
- stellaops-certificates:/app/etc/certs:ro
environment:
KnowledgeSearch__OnnxModelPath: "/app/models/all-MiniLM-L6-v2.onnx"
depends_on:
runtime-assets-init:
condition: service_completed_successfully
advisory-ai-worker:
volumes:
- stellaops-models:/app/models:ro
- stellaops-certificates:/app/etc/certs:ro
environment:
KnowledgeSearch__OnnxModelPath: "/app/models/all-MiniLM-L6-v2.onnx"
depends_on:
runtime-assets-init:
condition: service_completed_successfully
volumes:
stellaops-models:
name: stellaops-models
stellaops-certificates:
name: stellaops-certificates
stellaops-licenses:
name: stellaops-licenses

View File

@@ -0,0 +1,63 @@
#!/bin/sh
# ---------------------------------------------------------------------------
# init-volumes.sh — One-shot init container script.
#
# Copies runtime data assets from the data image into mounted volumes.
# Runs as part of docker-compose.runtime-assets.yml and exits when done.
#
# Mount points (set via environment or defaults):
# MODELS_DEST /mnt/models -> ML model weights
# CERTS_DEST /mnt/certificates -> Certificates and trust bundles
# LICENSES_DEST /mnt/licenses -> License attribution files
# ---------------------------------------------------------------------------
set -e
MODELS_DEST="${MODELS_DEST:-/mnt/models}"
CERTS_DEST="${CERTS_DEST:-/mnt/certificates}"
LICENSES_DEST="${LICENSES_DEST:-/mnt/licenses}"
log() { echo "[init-volumes] $*"; }
# Models
if [ -d /data/models ] && [ "$(ls -A /data/models 2>/dev/null)" ]; then
log "Copying ML models to $MODELS_DEST..."
mkdir -p "$MODELS_DEST"
cp -rn /data/models/* "$MODELS_DEST/" 2>/dev/null || cp -r /data/models/* "$MODELS_DEST/"
log " Models ready."
else
log " No models found in /data/models (semantic search will use fallback)."
fi
# Certificates
if [ -d /data/certificates ] && [ "$(ls -A /data/certificates 2>/dev/null)" ]; then
log "Copying certificates to $CERTS_DEST..."
mkdir -p "$CERTS_DEST"
cp -rn /data/certificates/* "$CERTS_DEST/" 2>/dev/null || cp -r /data/certificates/* "$CERTS_DEST/"
log " Certificates ready."
else
log " No certificates found in /data/certificates."
fi
# Licenses
if [ -d /data/licenses ] && [ "$(ls -A /data/licenses 2>/dev/null)" ]; then
log "Copying license files to $LICENSES_DEST..."
mkdir -p "$LICENSES_DEST"
cp -rn /data/licenses/* "$LICENSES_DEST/" 2>/dev/null || cp -r /data/licenses/* "$LICENSES_DEST/"
log " Licenses ready."
else
log " No license files found in /data/licenses."
fi
# Verify ONNX model is real (not placeholder)
ONNX_FILE="$MODELS_DEST/all-MiniLM-L6-v2.onnx"
if [ -f "$ONNX_FILE" ]; then
SIZE=$(wc -c < "$ONNX_FILE" 2>/dev/null || echo 0)
if [ "$SIZE" -lt 1000 ]; then
log " WARNING: ONNX model at $ONNX_FILE is only $SIZE bytes (placeholder?)."
log " Run ./devops/runtime-assets/acquire.sh --models to download real weights."
else
log " ONNX model verified: $SIZE bytes."
fi
fi
log "Init complete."

View File

@@ -0,0 +1,204 @@
# Runtime Data Assets Manifest
# Pinned versions, checksums, and licensing for all runtime data assets.
# Used by acquire.sh for download verification and by CI for release gating.
#
# To update a pinned version:
# 1. Change the entry below
# 2. Run: ./devops/runtime-assets/acquire.sh --verify
# 3. Update NOTICE.md and third-party-licenses/ if license changed
version: "1.0.0"
updated: "2026-02-25"
assets:
# ---------------------------------------------------------------------------
# ML Models
# ---------------------------------------------------------------------------
onnx-embedding-model:
name: "all-MiniLM-L6-v2 (ONNX)"
category: "ml-models"
required: true
degraded_without: true # falls back to character-ngram encoder
source: "https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx"
license: "Apache-2.0"
license_file: "third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt"
notice_entry: true # listed in NOTICE.md
destination: "src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx"
runtime_path: "models/all-MiniLM-L6-v2.onnx"
env_override: "KnowledgeSearch__OnnxModelPath"
size_approx: "80 MB"
sha256: "6fd5d72fe4589f189f8ebc006442dbb529bb7ce38f8082112682524616046452"
used_by:
- "StellaOps.AdvisoryAI (OnnxVectorEncoder)"
notes: >
Current file in repo is a 120-byte placeholder.
Must be replaced with actual weights before production release.
# ---------------------------------------------------------------------------
# JDK (for Ghidra)
# ---------------------------------------------------------------------------
jdk:
name: "Eclipse Temurin JRE 17"
category: "binary-analysis"
required: false # only if GhidraOptions__Enabled=true
source: "https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz"
license: "GPL-2.0-with-classpath-exception"
destination: "/opt/java/openjdk/"
env_override: "GhidraOptions__JavaHome"
size_approx: "55 MB"
sha256: "PENDING" # TODO: pin after first verified download
used_by:
- "StellaOps.BinaryIndex.Ghidra (GhidraHeadlessManager)"
notes: >
GPLv2+CE allows linking without copyleft obligation.
Only needed for deployments using Ghidra binary analysis.
# ---------------------------------------------------------------------------
# Ghidra
# ---------------------------------------------------------------------------
ghidra:
name: "Ghidra 11.2 PUBLIC"
category: "binary-analysis"
required: false # only if GhidraOptions__Enabled=true
source: "https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip"
license: "Apache-2.0"
destination: "/opt/ghidra/"
env_override: "GhidraOptions__GhidraHome"
size_approx: "1.5 GB"
sha256: "PENDING" # TODO: pin after first verified download
used_by:
- "StellaOps.BinaryIndex.Ghidra (GhidraService, GhidraHeadlessManager)"
notes: >
Full Ghidra installation with analyzers, BSim, and Version Tracking.
Disable with GhidraOptions__Enabled=false to skip entirely.
# ---------------------------------------------------------------------------
# Certificates (development defaults — replace for production)
# ---------------------------------------------------------------------------
dev-certificates:
name: "Development TLS certificates"
category: "certificates"
required: true
source: "local" # shipped in etc/authority/keys/
destination: "etc/authority/keys/"
runtime_path: "/app/etc/certs/"
env_override: "Kestrel__Certificates__Default__Path"
mount: "ro"
used_by:
- "All services (Kestrel TLS)"
notes: >
Dev-only. Replace with production certificates before deployment.
See docs/SECURITY_HARDENING_GUIDE.md.
trust-bundle:
name: "CA trust bundle"
category: "certificates"
required: true
source: "local" # shipped in etc/trust-profiles/assets/
destination: "etc/trust-profiles/assets/"
runtime_path: "/etc/ssl/certs/ca-certificates.crt"
mount: "ro"
used_by:
- "All services (HTTPS verification, attestation)"
notes: >
Combined CA bundle. For regional deployments include additional
trust anchors (russian_trusted_bundle.pem, etc).
rekor-public-key:
name: "Rekor transparency log public key"
category: "certificates"
required: true # for Sigstore verification
source: "local"
destination: "etc/trust-profiles/assets/rekor-public.pem"
used_by:
- "Attestor (Sigstore receipt verification)"
- "AirGapTrustStoreIntegration"
# ---------------------------------------------------------------------------
# Regional crypto configuration
# ---------------------------------------------------------------------------
crypto-profiles:
name: "Regional crypto configuration"
category: "configuration"
required: false # only for regional compliance
source: "local"
files:
- "etc/appsettings.crypto.international.yaml"
- "etc/appsettings.crypto.eu.yaml"
- "etc/appsettings.crypto.russia.yaml"
- "etc/appsettings.crypto.china.yaml"
- "etc/crypto-plugins-manifest.json"
used_by:
- "All services (crypto provider selection)"
notes: >
Selected via compose overlay (docker-compose.compliance-*.yml).
See devops/compose/README.md.
# ---------------------------------------------------------------------------
# Evidence storage
# ---------------------------------------------------------------------------
evidence-storage:
name: "Evidence object store"
category: "persistent-storage"
required: true
type: "volume"
runtime_path: "/data/evidence"
env_override: "EvidenceLocker__ObjectStore__FileSystem__RootPath"
mount: "rw"
sizing: "~1 GB per 1000 scans"
used_by:
- "EvidenceLocker"
- "Attestor"
notes: >
Persistent named volume. Content-addressed, append-only.
Include in backup strategy.
# ---------------------------------------------------------------------------
# Search seed snapshots (included in dotnet publish — no acquisition needed)
# ---------------------------------------------------------------------------
search-snapshots:
name: "Unified search seed snapshots"
category: "search-data"
required: true
source: "included" # part of dotnet publish output
destination: "src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/"
files:
- "findings.snapshot.json"
- "vex.snapshot.json"
- "policy.snapshot.json"
- "graph.snapshot.json"
- "scanner.snapshot.json"
- "opsmemory.snapshot.json"
- "timeline.snapshot.json"
used_by:
- "UnifiedSearchIndexer (bootstrap on first start)"
notes: >
Copied to output by .csproj Content items.
Live data adapters refresh the index every 300s at runtime.
# ---------------------------------------------------------------------------
# Translations (included in Angular build — no acquisition needed)
# ---------------------------------------------------------------------------
translations:
name: "UI translation bundles"
category: "i18n"
required: true
source: "included" # part of Angular dist build
destination: "src/Web/StellaOps.Web/src/i18n/"
locales:
- "en-US"
- "de-DE"
- "bg-BG"
- "ru-RU"
- "es-ES"
- "fr-FR"
- "uk-UA"
- "zh-CN"
- "zh-TW"
used_by:
- "Console (Angular frontend)"
- "TranslationRegistry (backend override)"
notes: >
Baked into Angular dist bundle. Backend can override via
database-backed ITranslationBundleProvider (priority 100).