Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added MongoPackRunApprovalStore for managing approval states with MongoDB. - Introduced MongoPackRunArtifactUploader for uploading and storing artifacts. - Created MongoPackRunLogStore to handle logging of pack run events. - Developed MongoPackRunStateStore for persisting and retrieving pack run states. - Implemented unit tests for MongoDB stores to ensure correct functionality. - Added MongoTaskRunnerTestContext for setting up MongoDB test environment. - Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
152 lines
8.0 KiB
Markdown
152 lines
8.0 KiB
Markdown
# Surface.FS Design (Epic: SURFACE-SHARING)
|
||
|
||
> **Status:** Draft v1.0 — aligns with tasks `SURFACE-FS-01..06`, `SCANNER-SURFACE-01..05`, `ZASTAVA-SURFACE-01..02`, `SCHED-SURFACE-01`, `OPS-SECRETS-01..02`.
|
||
>
|
||
> **Audience:** Scanner Worker/WebService, Zastava, Scheduler, DevOps.
|
||
|
||
## 1. Purpose
|
||
|
||
Surface.FS provides a unified content-addressable cache for Scanner-derived artefacts (layer manifests, entry traces, SBOM fragments, runtime deltas). It enables:
|
||
|
||
- Sharing scan results between Worker, WebService, Zastava Observer/Webhook, Scheduler planners, Export Center, and future CLI operations.
|
||
- Deterministic reproduction of scan evidence (manifests and payloads) in both connected and air-gapped environments.
|
||
- Efficient data movement by storing manifests once and referencing them via stable pointers.
|
||
|
||
## 2. Core Concepts
|
||
|
||
### 2.1 Artefact Key
|
||
|
||
Each artefact is addressed by a tuple `(tenant, surfaceKind, contentDigest)` where `contentDigest` is a SHA256 of the canonical payload. `surfaceKind` identifies artefact type (see Manifest schema below).
|
||
|
||
### 2.2 Manifest
|
||
|
||
Manifests describe the artefact metadata and storage pointers. They are stored in the `surface-manifests` bucket and fetched by consumers before retrieving bulk data.
|
||
|
||
```json
|
||
{
|
||
"schema": "stellaops.surface.manifest@1",
|
||
"tenant": "acme",
|
||
"kind": "layer-entry-trace",
|
||
"digest": "sha256:ab12...",
|
||
"createdAt": "2025-10-29T12:00:00Z",
|
||
"expiresAt": "2025-11-05T12:00:00Z",
|
||
"source": {
|
||
"scannerBuild": "stellaops/scanner@sha256:deadbeef",
|
||
"imageDigest": "sha256:cafe...",
|
||
"scanId": "scan-1234"
|
||
},
|
||
"storage": {
|
||
"bucket": "surface-cache",
|
||
"objectKey": "tenants/acme/layer-entry-trace/sha256/ab/12/.../payload.json.zst",
|
||
"sizeBytes": 524288,
|
||
"contentType": "application/json+zstd"
|
||
},
|
||
"integrity": {
|
||
"hash": "sha256:ab12...",
|
||
"signature": null
|
||
}
|
||
}
|
||
```
|
||
|
||
### 2.3 Payload Storage
|
||
|
||
Large payloads (SBOM fragments, entry traces, runtime events) live in the same object store as manifests (RustFS/S3). Manifests record relative paths so offline bundles can copy both manifest and payload without modification.
|
||
|
||
## 3. APIs
|
||
|
||
Surface.FS exposes a gRPC/HTTP API consumed by .NET clients:
|
||
|
||
| Method | Description |
|
||
|--------|-------------|
|
||
| `PutManifest(PutManifestRequest)` | Stores manifest + optional payload. Idempotent via `digest`. |
|
||
| `GetManifest(GetManifestRequest)` | Returns manifest metadata; 404 if missing. |
|
||
| `GetPayload(GetPayloadRequest)` | Streams payload bytes (optionally decompressing). |
|
||
| `ListManifests(ListManifestRequest)` | Enumerates manifests for tenant/kind with pagination. |
|
||
| `DeleteManifest(DeleteManifestRequest)` | (Optional) Removes manifest/payload based on retention policies. |
|
||
|
||
.NET client wraps these calls and handles retries using Polly policies.
|
||
|
||
### WebService integration (2025-11-05)
|
||
|
||
- `/api/v1/scans/{id}` and `/api/v1/reports` responses now include a `surface` block containing:
|
||
- `manifestUri` – `cas://` pointer to the Surface manifest JSON.
|
||
- `manifestDigest` – canonical SHA-256 over the manifest payload.
|
||
- `manifest.artifacts[]` – deterministic list with `kind`, `uri`, `digest`, `mediaType`, `format`, and optional `view`. URIs reuse the `ArtifactObjectKeyBuilder` semantics (`cas://{bucket}/{rootPrefix}/images/...`).
|
||
- This allows UI/CLI consumers to fetch manifests or artefacts without additional Surface.FS round-trips.
|
||
|
||
## 4. Library Responsibilities
|
||
|
||
Surface.FS library for .NET hosts provides:
|
||
|
||
- `ISurfaceManifestWriter` / `ISurfaceManifestReader` interfaces.
|
||
- Content-addressed path builder (`SurfacePathBuilder`).
|
||
- Tenant namespace isolation and bucket configuration (via Surface.Env).
|
||
- Local cache abstraction `ISurfaceCache` with default `FileSurfaceCache` implementation (uses `Surface:Cache:Root` / `SCANNER_SURFACE_CACHE_ROOT`, enforces quotas, serialises writes with per-key semaphores).
|
||
- `SurfaceCacheKey` helper that normalises cache entries as `{namespace}/{tenant}/{sha256}`. EntryTrace graphs use the `entrytrace.graph` namespace so Worker/WebService/CLI can share cached results deterministically.
|
||
- Metrics: `surface_manifest_put_seconds`, `surface_manifest_cache_hit_total`, etc.
|
||
|
||
## 5. Retention & Eviction
|
||
|
||
- Manifests include optional `expiresAt`; Worker defaults to 30 days for SBOM fragments, 7 days for entry traces.
|
||
- Background job `SurfaceCacheMaintenanceService` evicts local cache entries exceeding quota, oldest-first.
|
||
- Object storage retention policies are managed by DevOps; library exposes metrics but does not auto-delete unless instructed.
|
||
|
||
## 6. Offline Kit Handling
|
||
|
||
Offline kits include:
|
||
|
||
```
|
||
offline/surface/
|
||
manifests/
|
||
tenants/<tenant>/<kind>/<digest>.json
|
||
payloads/
|
||
tenants/<tenant>/<kind>/<digest>.json.zst
|
||
manifest-index.json
|
||
```
|
||
|
||
Import script calls `PutManifest` for each manifest, verifying digests. This enables Zastava and Scheduler running offline to consume cached data without re-scanning.
|
||
|
||
### 6.1 EntryTrace Cache Usage
|
||
|
||
Scanner.Worker serialises EntryTrace graphs into Surface.FS using `SurfaceCacheKey(namespace: "entrytrace.graph", tenant, sha256(options|env|entrypoint))`. At runtime the worker checks the cache before invoking analyzers; cache hits bypass parsing and feed the result store/attestor pipeline directly. The same namespace is consumed by WebService and CLI to retrieve cached graphs for reporting.
|
||
|
||
### 6.2 BuildX generator path
|
||
|
||
`StellaOps.Scanner.Sbomer.BuildXPlugin` reuses the same CAS layout via the `--surface-*` descriptor flags (or `STELLAOPS_SURFACE_*` env vars). When layer fragment JSON, EntryTrace graph JSON, or NDJSON files are supplied, the plug-in writes them under `scanner/surface/**` within the configured CAS root and emits a manifest pointer so Scanner.WebService can pick up the artefacts without re-scanning. The Surface manifest JSON can also be copied to an arbitrary path via `--surface-manifest-output` for CI artefacts/offline kits.
|
||
|
||
## 7. Security & Tenancy
|
||
|
||
- Tenant ID is mandatory; Surface.Validation enforces match with Authority token.
|
||
- Manifests/payloads stored in tenant-specific prefixes to prevent leakage.
|
||
- Optional manifest signing (future) will use `Surface.Secrets` to load signing keys.
|
||
- TLS enforced between hosts and Surface.FS endpoint; certificate pins configured via Surface.Env.
|
||
|
||
## 8. Observability
|
||
|
||
- Logs include manifest SHA, tenant, kind, and cache namespace; payload paths are truncated for brevity.
|
||
- Prometheus metrics (emitted by Scanner.Worker) now include:
|
||
- `scanner_worker_surface_manifests_published_total`, `scanner_worker_surface_manifests_failed_total`, `scanner_worker_surface_manifests_skipped_total` with labels `{queue, job_kind, surface_result, reason?, surface_payload_count}`.
|
||
- `scanner_worker_surface_payload_persisted_total` with `{surface_kind}` to track cache churn (`entrytrace.graph`, `entrytrace.ndjson`, `layer.fragments`, …).
|
||
- `scanner_worker_surface_manifest_publish_duration_ms` histogram for end-to-end persistence latency.
|
||
- Grafana dashboard JSON: `docs/modules/scanner/operations/surface-worker-grafana-dashboard.json` (panels for publish outcomes, latency, per-kind cache rate, and failure reasons). Import alongside the analyzer dashboard and point it to the Scanner Prometheus datasource.
|
||
- Tracing spans: `surface.fs.put`, `surface.fs.get`, `surface.fs.cache`.
|
||
|
||
## 9. Testing Strategy
|
||
|
||
- Unit tests for path builder, manifest serializer, and local cache eviction.
|
||
- Integration tests using embedded RustFS or MinIO container to validate API interactions.
|
||
- Offline kit tests verifying export/import cycle round-trips manifests and payloads.
|
||
|
||
## 10. Future Enhancements
|
||
|
||
- Manifest signing (DSSE) to support tamper detection in hostile environments.
|
||
- Differential manifests to optimise large SBOM updates.
|
||
- Cross-region replication for multi-site deployments.
|
||
|
||
## 11. References
|
||
|
||
- Surface.Env Design (`docs/modules/scanner/design/surface-env.md`)
|
||
- Surface.Secrets Design (`docs/modules/scanner/design/surface-secrets.md`)
|
||
- Surface.Validation Design (`docs/modules/scanner/design/surface-validation.md`)
|
||
- Zastava Deployment Runbook (`docs/modules/devops/runbooks/zastava-deployment.md`)
|