Files

master 4fe8eb56ae enrich the setup. setup fixes. minimize the consolidation plan

2026-02-26 08:51:47 +02:00

12 KiB

Raw Blame History

Runtime Data Assets

Runtime data assets are files that Stella Ops services need at runtime but that are not produced by dotnet publish or the Angular build. They must be provisioned separately — either baked into Docker images, mounted as volumes, or supplied via an init container.

This directory contains the canonical inventory, acquisition scripts, and packaging tools for all such assets.

If you are setting up Stella Ops for the first time, read this document before running docker compose up. Services will start without these assets but will operate in degraded mode (no semantic search, no binary analysis, dev-only certificates).

Quick reference

Category	Required?	Size	Provisioned by
ML model weights	Yes (for semantic search)	~80 MB	`acquire.sh`
JDK + Ghidra	Optional (binary analysis)	~1.6 GB	`acquire.sh`
Search seed snapshots	Yes (first boot)	~7 KB	Included in source
Translations (i18n)	Yes	~500 KB	Baked into Angular dist
Certificates and trust stores	Yes	~50 KB	`etc/` + volume mounts
Regional crypto configuration	Per region	~20 KB	Compose overlays
Evidence storage	Yes	Grows	Persistent named volume
Vulnerability feeds	Yes (offline)	~300 MB	Offline Kit (`docs/OFFLINE_KIT.md`)

1. ML model weights

What: The all-MiniLM-L6-v2 sentence-transformer model in ONNX format, used by OnnxVectorEncoder for semantic vector search in AdvisoryAI.

License: Apache-2.0 (compatible with BUSL-1.1; see third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt).

Where it goes:

<app-root>/models/all-MiniLM-L6-v2.onnx

Configurable via KnowledgeSearch__OnnxModelPath environment variable.

How to acquire:

# Option A: use the acquisition script (recommended)
./devops/runtime-assets/acquire.sh --models

# Option B: manual download
mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
  -o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx

Verification:

sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
# Expected: see manifest.yaml for pinned digest

Degraded mode: If the model file is missing or is a placeholder, the encoder falls back to a deterministic character-ngram projection. Search works but semantic quality is significantly reduced.

Docker / Compose mount:

services:
  advisory-ai-web:
    volumes:
      - ml-models:/app/models:ro

volumes:
  ml-models:
    driver: local

Air-gap: Include the .onnx file in the Offline Kit under models/all-MiniLM-L6-v2.onnx. The acquire.sh --package command produces a verified tarball for sneakernet transfer.

2. JDK + Ghidra

What: OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary analysis (decompilation, BSim similarity, call-graph extraction).

License: OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra — Apache-2.0 (NSA release).

Required only when: GhidraOptions__Enabled=true (default). Set to false to skip entirely if binary analysis is not needed.

Where it goes:

/opt/java/openjdk/          # JDK installation (JAVA_HOME)
/opt/ghidra/                # Ghidra installation (GhidraOptions__GhidraHome)
/tmp/stellaops-ghidra/      # Workspace (GhidraOptions__WorkDir) — writable

How to acquire:

# Option A: use the acquisition script
./devops/runtime-assets/acquire.sh --ghidra

# Option B: manual
# JDK (Eclipse Temurin 17)
curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \
  | tar -xz -C /opt/java/

# Ghidra 11.2
curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \
  -o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/

Docker: For services that need Ghidra, use a dedicated Dockerfile stage or a sidecar data image. See docs/modules/binary-index/ghidra-deployment.md.

Air-gap: Pre-download both archives on a connected machine and include them in the Offline Kit under tools/jdk/ and tools/ghidra/.

3. Search seed snapshots

What: Small JSON files that bootstrap the unified search index on first start. Without them, search returns empty results until live data adapters populate the index.

Where they are:

src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/
  findings.snapshot.json       (1.3 KB)
  vex.snapshot.json            (1.2 KB)
  policy.snapshot.json         (1.2 KB)
  graph.snapshot.json          (758 B)
  scanner.snapshot.json        (751 B)
  opsmemory.snapshot.json      (1.1 KB)
  timeline.snapshot.json       (824 B)

How they get into the image: The .csproj copies them to the output directory via <Content> items. They are included in dotnet publish output automatically.

Runtime behavior: UnifiedSearchIndexer loads them at startup and refreshes from live data adapters every 300 seconds (UnifiedSearch__AutoRefreshIntervalSeconds).

No separate provisioning needed unless you want to supply custom seed data, in which case mount a volume at the snapshot path and set:

KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json

4. Translations (i18n)

What: JSON translation bundles for the Angular frontend, supporting 9 locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW.

Where they are:

src/Web/StellaOps.Web/src/i18n/*.common.json

How they get into the image: Compiled into the Angular dist/ bundle during npm run build. The console Docker image (devops/docker/Dockerfile.console) includes them automatically.

Runtime overrides: The backend TranslationRegistry supports database-backed translation overrides (priority 100) over file-based bundles (priority 10). For custom translations in offline environments, seed the database or mount override JSON files.

No separate provisioning needed for standard deployments.

5. Certificates and trust stores

What: TLS certificates, signing keys, and CA trust bundles for inter-service communication and attestation verification.

Development defaults (not for production):

etc/authority/keys/
  kestrel-dev.pfx              # Kestrel TLS (password: devpass)
  kestrel-dev.crt / .key
  ack-token-dev.pem            # Token signing key
  signing-dev.pem              # Service signing key

etc/trust-profiles/assets/
  ca.crt                       # Root CA bundle
  rekor-public.pem             # Rekor transparency log public key

Compose mounts (already configured):

volumes:
  - ../../etc/authority/keys:/app/etc/certs:ro
  - ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro

Production: Replace dev certificates with properly issued certificates. Mount as read-only volumes. See docs/SECURITY_HARDENING_GUIDE.md.

Air-gap: Include the full trust chain in the Offline Kit. For Russian deployments, include certificates/russian_trusted_bundle.pem (see docs/OFFLINE_KIT.md).

6. Regional crypto configuration

What: YAML configuration files that select the cryptographic profile (algorithms, key types, HSM settings) per deployment region.

Files:

etc/appsettings.crypto.international.yaml   # Default (ECDSA/RSA/EdDSA)
etc/appsettings.crypto.eu.yaml              # eIDAS qualified signatures
etc/appsettings.crypto.russia.yaml          # GOST R 34.10/34.11
etc/appsettings.crypto.china.yaml           # SM2/SM3/SM4
etc/crypto-plugins-manifest.json            # Plugin registry

Selection: Via Docker Compose overlays:

# EU deployment
docker compose -f docker-compose.stella-ops.yml \
               -f docker-compose.compliance-eu.yml up -d

No separate provisioning needed — files ship in the source tree and are selected by compose overlay. See devops/compose/README.md for details.

7. Evidence storage

What: Persistent storage for evidence bundles (SBOMs, attestations, signatures, scan proofs). Grows with usage.

Default path: /data/evidence (named volume evidence-data).

Configured via: EvidenceLocker__ObjectStore__FileSystem__RootPath

Compose (already configured):

volumes:
  evidence-data:
    driver: local

Sizing: Plan ~1 GB per 1000 scans as a rough baseline. Monitor with Prometheus metric evidence_locker_storage_bytes_total.

Backup: Include in PostgreSQL backup strategy. Evidence files are content-addressed and immutable — append-only, safe to rsync.

8. Vulnerability feeds

What: Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds). Required for offline vulnerability matching.

Provisioned by: The Offline Update Kit (docs/OFFLINE_KIT.md). This is a separate, well-documented workflow. See that document for full details.

Not covered by acquire.sh — feed management is handled by the Concelier module and the Offline Kit import pipeline.

Acquisition script

The acquire.sh script automates downloading, verifying, and staging runtime data assets. It is idempotent — safe to run multiple times.

# Acquire everything (models + Ghidra + JDK)
./devops/runtime-assets/acquire.sh --all

# Models only (for environments without binary analysis)
./devops/runtime-assets/acquire.sh --models

# Ghidra + JDK only
./devops/runtime-assets/acquire.sh --ghidra

# Package all acquired assets into a portable tarball for air-gap transfer
./devops/runtime-assets/acquire.sh --package

# Verify already-acquired assets against pinned checksums
./devops/runtime-assets/acquire.sh --verify

Asset checksums are pinned in manifest.yaml in this directory. The script verifies SHA-256 digests after every download and refuses corrupted files.

Docker integration

Option A: Bake into image (simplest)

Run acquire.sh --models before docker build. The .csproj copies models/all-MiniLM-L6-v2.onnx into the publish output automatically.

Option B: Shared data volume (recommended for production)

Build a lightweight data image or use an init container:

# Dockerfile.runtime-assets
FROM busybox:1.37
COPY models/ /data/models/
VOLUME /data/models

Mount in compose:

services:
  advisory-ai-web:
    volumes:
      - runtime-assets:/app/models:ro
    depends_on:
      runtime-assets-init:
        condition: service_completed_successfully

  runtime-assets-init:
    build:
      context: .
      dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
    volumes:
      - runtime-assets:/data/models

volumes:
  runtime-assets:

Option C: Air-gap tarball

./devops/runtime-assets/acquire.sh --package
# Produces: out/runtime-assets/stella-ops-runtime-assets-<date>.tar.gz
# Transfer to air-gapped host, then:
tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/

Checklist: before you ship a release

models/all-MiniLM-L6-v2.onnx contains real weights (not the 120-byte placeholder)
acquire.sh --verify passes all checksums
Certificates are production-issued (not *-dev.*)
Evidence storage volume is provisioned with adequate capacity
Regional crypto profile is selected if applicable
Offline Kit includes runtime assets tarball if deploying to air-gap
NOTICE.md and third-party-licenses/ are included in the image

Installation guide: docs/INSTALL_GUIDE.md
Offline Update Kit: docs/OFFLINE_KIT.md
Security hardening: docs/SECURITY_HARDENING_GUIDE.md
Ghidra deployment: docs/modules/binary-index/ghidra-deployment.md
LLM model bundles (separate from ONNX): docs/modules/advisory-ai/guides/offline-model-bundles.md
Third-party dependencies: docs/legal/THIRD-PARTY-DEPENDENCIES.md
Compose profiles: devops/compose/README.md

12 KiB Raw Blame History