# Runtime Data Assets

Runtime data assets are files that Stella Ops services need at runtime but that
are **not produced by `dotnet publish`** or the Angular build. They must be
provisioned separately — either baked into Docker images, mounted as volumes, or
supplied via an init container.

This directory contains the canonical inventory, acquisition scripts, and
packaging tools for all such assets.

**If you are setting up Stella Ops for the first time**, read this document
before running `docker compose up`. Services will start without these assets but
will operate in degraded mode (no semantic search, no binary analysis, dev-only
certificates).

---

## Quick reference

| Category | Required? | Size | Provisioned by |
|---|---|---|---|
| [ML model weights](#1-ml-model-weights) | Yes (for semantic search) | ~80 MB | `acquire.sh` |
| [JDK + Ghidra](#2-jdk--ghidra) | Optional (binary analysis) | ~1.6 GB | `acquire.sh` |
| [Search seed snapshots](#3-search-seed-snapshots) | Yes (first boot) | ~7 KB | Included in source |
| [Translations (i18n)](#4-translations-i18n) | Yes | ~500 KB | Baked into Angular dist |
| [Certificates and trust stores](#5-certificates-and-trust-stores) | Yes | ~50 KB | `etc/` + volume mounts |
| [Regional crypto configuration](#6-regional-crypto-configuration) | Per region | ~20 KB | Compose overlays |
| [Evidence storage](#7-evidence-storage) | Yes | Grows | Persistent named volume |
| [Vulnerability feeds](#8-vulnerability-feeds) | Yes (offline) | ~300 MB | Offline Kit (`docs/OFFLINE_KIT.md`) |

---

## 1. ML model weights

**What:** The `all-MiniLM-L6-v2` sentence-transformer model in ONNX format,
used by `OnnxVectorEncoder` for semantic vector search in AdvisoryAI.

**License:** Apache-2.0 (compatible with BUSL-1.1; see `third-party-licenses/all-MiniLM-L6-v2-Apache-2.0.txt`).

**Where it goes:**

```
<app-root>/models/all-MiniLM-L6-v2.onnx
```

Configurable via `KnowledgeSearch__OnnxModelPath` environment variable.

**How to acquire:**

```bash
# Option A: use the acquisition script (recommended)
./devops/runtime-assets/acquire.sh --models

# Option B: manual download
mkdir -p src/AdvisoryAI/StellaOps.AdvisoryAI/models
curl -L https://huggingface.co/sentence-transformers/all-MiniLM-L6-v2/resolve/main/onnx/model.onnx \
  -o src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
```

**Verification:**

```bash
sha256sum src/AdvisoryAI/StellaOps.AdvisoryAI/models/all-MiniLM-L6-v2.onnx
# Expected: see manifest.yaml for pinned digest
```

**Degraded mode:** If the model file is missing or is a placeholder, the encoder
falls back to a deterministic character-ngram projection. Search works but
semantic quality is significantly reduced.

**Docker / Compose mount:**

```yaml
services:
  advisory-ai-web:
    volumes:
      - ml-models:/app/models:ro

volumes:
  ml-models:
    driver: local
```

**Air-gap:** Include the `.onnx` file in the Offline Kit under
`models/all-MiniLM-L6-v2.onnx`. The `acquire.sh --package` command produces a
verified tarball for sneakernet transfer.

---

## 2. JDK + Ghidra

**What:** OpenJDK 17+ runtime and Ghidra 11.x installation for headless binary
analysis (decompilation, BSim similarity, call-graph extraction).

**License:** OpenJDK — GPLv2+CE (Classpath Exception, allows linking); Ghidra —
Apache-2.0 (NSA release).

**Required only when:** `GhidraOptions__Enabled=true` (default). Set to `false`
to skip entirely if binary analysis is not needed.

**Where it goes:**

```
/opt/java/openjdk/          # JDK installation (JAVA_HOME)
/opt/ghidra/                # Ghidra installation (GhidraOptions__GhidraHome)
/tmp/stellaops-ghidra/      # Workspace (GhidraOptions__WorkDir) — writable
```

**How to acquire:**

```bash
# Option A: use the acquisition script
./devops/runtime-assets/acquire.sh --ghidra

# Option B: manual
# JDK (Eclipse Temurin 17)
curl -L https://github.com/adoptium/temurin17-binaries/releases/download/jdk-17.0.13%2B11/OpenJDK17U-jre_x64_linux_hotspot_17.0.13_11.tar.gz \
  | tar -xz -C /opt/java/

# Ghidra 11.2
curl -L https://github.com/NationalSecurityAgency/ghidra/releases/download/Ghidra_11.2_build/ghidra_11.2_PUBLIC_20241105.zip \
  -o ghidra.zip && unzip ghidra.zip -d /opt/ghidra/
```

**Docker:** For services that need Ghidra, use a dedicated Dockerfile stage or a
sidecar data image. See `docs/modules/binary-index/ghidra-deployment.md`.

**Air-gap:** Pre-download both archives on a connected machine and include them
in the Offline Kit under `tools/jdk/` and `tools/ghidra/`.

---

## 3. Search seed snapshots

**What:** Small JSON files that bootstrap the unified search index on first
start. Without them, search returns empty results until live data adapters
populate the index.

**Where they are:**

```
src/AdvisoryAI/StellaOps.AdvisoryAI/UnifiedSearch/Snapshots/
  findings.snapshot.json       (1.3 KB)
  vex.snapshot.json            (1.2 KB)
  policy.snapshot.json         (1.2 KB)
  graph.snapshot.json          (758 B)
  scanner.snapshot.json        (751 B)
  opsmemory.snapshot.json      (1.1 KB)
  timeline.snapshot.json       (824 B)
```

**How they get into the image:** The `.csproj` copies them to the output
directory via `<Content>` items. They are included in `dotnet publish` output
automatically.

**Runtime behavior:** `UnifiedSearchIndexer` loads them at startup and refreshes
from live data adapters every 300 seconds (`UnifiedSearch__AutoRefreshIntervalSeconds`).

**No separate provisioning needed** unless you want to supply custom seed data,
in which case mount a volume at the snapshot path and set:

```
KnowledgeSearch__UnifiedFindingsSnapshotPath=/app/snapshots/findings.snapshot.json
```

---

## 4. Translations (i18n)

**What:** JSON translation bundles for the Angular frontend, supporting 9
locales: en-US, de-DE, bg-BG, ru-RU, es-ES, fr-FR, uk-UA, zh-CN, zh-TW.

**Where they are:**

```
src/Web/StellaOps.Web/src/i18n/*.common.json
```

**How they get into the image:** Compiled into the Angular `dist/` bundle during
`npm run build`. The console Docker image (`devops/docker/Dockerfile.console`)
includes them automatically.

**Runtime overrides:** The backend `TranslationRegistry` supports
database-backed translation overrides (priority 100) over file-based bundles
(priority 10). For custom translations in offline environments, seed the
database or mount override JSON files.

**No separate provisioning needed** for standard deployments.

---

## 5. Certificates and trust stores

**What:** TLS certificates, signing keys, and CA trust bundles for inter-service
communication and attestation verification.

**Development defaults (not for production):**

```
etc/authority/keys/
  kestrel-dev.pfx              # Kestrel TLS (password: devpass)
  kestrel-dev.crt / .key
  ack-token-dev.pem            # Token signing key
  signing-dev.pem              # Service signing key

etc/trust-profiles/assets/
  ca.crt                       # Root CA bundle
  rekor-public.pem             # Rekor transparency log public key
```

**Compose mounts (already configured):**

```yaml
volumes:
  - ../../etc/authority/keys:/app/etc/certs:ro
  - ./combined-ca-bundle.crt:/etc/ssl/certs/ca-certificates.crt:ro
```

**Production:** Replace dev certificates with properly issued certificates.
Mount as read-only volumes. See `docs/SECURITY_HARDENING_GUIDE.md`.

**Air-gap:** Include the full trust chain in the Offline Kit. For Russian
deployments, include `certificates/russian_trusted_bundle.pem` (see
`docs/OFFLINE_KIT.md`).

---

## 6. Regional crypto configuration

**What:** YAML configuration files that select the cryptographic profile
(algorithms, key types, HSM settings) per deployment region.

**Files:**

```
etc/appsettings.crypto.international.yaml   # Default (ECDSA/RSA/EdDSA)
etc/appsettings.crypto.eu.yaml              # eIDAS qualified signatures
etc/appsettings.crypto.russia.yaml          # GOST R 34.10/34.11
etc/appsettings.crypto.china.yaml           # SM2/SM3/SM4
etc/crypto-plugins-manifest.json            # Plugin registry
```

**Selection:** Via Docker Compose overlays:

```bash
# EU deployment
docker compose -f docker-compose.stella-ops.yml \
               -f docker-compose.compliance-eu.yml up -d
```

**No separate provisioning needed** — files ship in the source tree and are
selected by compose overlay. See `devops/compose/README.md` for details.

---

## 7. Evidence storage

**What:** Persistent storage for evidence bundles (SBOMs, attestations,
signatures, scan proofs). Grows with usage.

**Default path:** `/data/evidence` (named volume `evidence-data`).

**Configured via:** `EvidenceLocker__ObjectStore__FileSystem__RootPath`

**Compose (already configured):**

```yaml
volumes:
  evidence-data:
    driver: local
```

**Sizing:** Plan ~1 GB per 1000 scans as a rough baseline. Monitor with
Prometheus metric `evidence_locker_storage_bytes_total`.

**Backup:** Include in PostgreSQL backup strategy. Evidence files are
content-addressed and immutable — append-only, safe to rsync.

---

## 8. Vulnerability feeds

**What:** Merged advisory feeds (OSV, GHSA, NVD 2.0, and regional feeds).
Required for offline vulnerability matching.

**Provisioned by:** The Offline Update Kit (`docs/OFFLINE_KIT.md`). This is a
separate, well-documented workflow. See that document for full details.

**Not covered by `acquire.sh`** — feed management is handled by the Concelier
module and the Offline Kit import pipeline.

---

## Acquisition script

The `acquire.sh` script automates downloading, verifying, and staging runtime
data assets. It is idempotent — safe to run multiple times.

```bash
# Acquire everything (models + Ghidra + JDK)
./devops/runtime-assets/acquire.sh --all

# Models only (for environments without binary analysis)
./devops/runtime-assets/acquire.sh --models

# Ghidra + JDK only
./devops/runtime-assets/acquire.sh --ghidra

# Package all acquired assets into a portable tarball for air-gap transfer
./devops/runtime-assets/acquire.sh --package

# Verify already-acquired assets against pinned checksums
./devops/runtime-assets/acquire.sh --verify
```

Asset checksums are pinned in `manifest.yaml` in this directory. The script
verifies SHA-256 digests after every download and refuses corrupted files.

---

## Docker integration

### Option A: Bake into image (simplest)

Run `acquire.sh --models` before `docker build`. The `.csproj` copies
`models/all-MiniLM-L6-v2.onnx` into the publish output automatically.

### Option B: Shared data volume (recommended for production)

Build a lightweight data image or use an init container:

```dockerfile
# Dockerfile.runtime-assets
FROM busybox:1.37
COPY models/ /data/models/
VOLUME /data/models
```

Mount in compose:

```yaml
services:
  advisory-ai-web:
    volumes:
      - runtime-assets:/app/models:ro
    depends_on:
      runtime-assets-init:
        condition: service_completed_successfully

  runtime-assets-init:
    build:
      context: .
      dockerfile: devops/runtime-assets/Dockerfile.runtime-assets
    volumes:
      - runtime-assets:/data/models

volumes:
  runtime-assets:
```

### Option C: Air-gap tarball

```bash
./devops/runtime-assets/acquire.sh --package
# Produces: out/runtime-assets/stella-ops-runtime-assets-<date>.tar.gz
# Transfer to air-gapped host, then:
tar -xzf stella-ops-runtime-assets-*.tar.gz -C /opt/stellaops/
```

---

## Checklist: before you ship a release

- [ ] `models/all-MiniLM-L6-v2.onnx` contains real weights (not the 120-byte placeholder)
- [ ] `acquire.sh --verify` passes all checksums
- [ ] Certificates are production-issued (not `*-dev.*`)
- [ ] Evidence storage volume is provisioned with adequate capacity
- [ ] Regional crypto profile is selected if applicable
- [ ] Offline Kit includes runtime assets tarball if deploying to air-gap
- [ ] `NOTICE.md` and `third-party-licenses/` are included in the image

---

## Related documentation

- Installation guide: `docs/INSTALL_GUIDE.md`
- Offline Update Kit: `docs/OFFLINE_KIT.md`
- Security hardening: `docs/SECURITY_HARDENING_GUIDE.md`
- Ghidra deployment: `docs/modules/binary-index/ghidra-deployment.md`
- LLM model bundles (separate from ONNX): `docs/modules/advisory-ai/guides/offline-model-bundles.md`
- Third-party dependencies: `docs/legal/THIRD-PARTY-DEPENDENCIES.md`
- Compose profiles: `devops/compose/README.md`