Files
git.stella-ops.org/docs/modules/scanner/design/surface-fs.md
master a1ce3f74fa
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement MongoDB-based storage for Pack Run approval, artifact, log, and state management
- Added MongoPackRunApprovalStore for managing approval states with MongoDB.
- Introduced MongoPackRunArtifactUploader for uploading and storing artifacts.
- Created MongoPackRunLogStore to handle logging of pack run events.
- Developed MongoPackRunStateStore for persisting and retrieving pack run states.
- Implemented unit tests for MongoDB stores to ensure correct functionality.
- Added MongoTaskRunnerTestContext for setting up MongoDB test environment.
- Enhanced PackRunStateFactory to correctly initialize state with gate reasons.
2025-11-07 10:01:47 +02:00

8.0 KiB
Raw Blame History

Surface.FS Design (Epic: SURFACE-SHARING)

Status: Draft v1.0 — aligns with tasks SURFACE-FS-01..06, SCANNER-SURFACE-01..05, ZASTAVA-SURFACE-01..02, SCHED-SURFACE-01, OPS-SECRETS-01..02.

Audience: Scanner Worker/WebService, Zastava, Scheduler, DevOps.

1. Purpose

Surface.FS provides a unified content-addressable cache for Scanner-derived artefacts (layer manifests, entry traces, SBOM fragments, runtime deltas). It enables:

  • Sharing scan results between Worker, WebService, Zastava Observer/Webhook, Scheduler planners, Export Center, and future CLI operations.
  • Deterministic reproduction of scan evidence (manifests and payloads) in both connected and air-gapped environments.
  • Efficient data movement by storing manifests once and referencing them via stable pointers.

2. Core Concepts

2.1 Artefact Key

Each artefact is addressed by a tuple (tenant, surfaceKind, contentDigest) where contentDigest is a SHA256 of the canonical payload. surfaceKind identifies artefact type (see Manifest schema below).

2.2 Manifest

Manifests describe the artefact metadata and storage pointers. They are stored in the surface-manifests bucket and fetched by consumers before retrieving bulk data.

{
  "schema": "stellaops.surface.manifest@1",
  "tenant": "acme",
  "kind": "layer-entry-trace",
  "digest": "sha256:ab12...",
  "createdAt": "2025-10-29T12:00:00Z",
  "expiresAt": "2025-11-05T12:00:00Z",
  "source": {
    "scannerBuild": "stellaops/scanner@sha256:deadbeef",
    "imageDigest": "sha256:cafe...",
    "scanId": "scan-1234"
  },
  "storage": {
    "bucket": "surface-cache",
    "objectKey": "tenants/acme/layer-entry-trace/sha256/ab/12/.../payload.json.zst",
    "sizeBytes": 524288,
    "contentType": "application/json+zstd"
  },
  "integrity": {
    "hash": "sha256:ab12...",
    "signature": null
  }
}

2.3 Payload Storage

Large payloads (SBOM fragments, entry traces, runtime events) live in the same object store as manifests (RustFS/S3). Manifests record relative paths so offline bundles can copy both manifest and payload without modification.

3. APIs

Surface.FS exposes a gRPC/HTTP API consumed by .NET clients:

Method Description
PutManifest(PutManifestRequest) Stores manifest + optional payload. Idempotent via digest.
GetManifest(GetManifestRequest) Returns manifest metadata; 404 if missing.
GetPayload(GetPayloadRequest) Streams payload bytes (optionally decompressing).
ListManifests(ListManifestRequest) Enumerates manifests for tenant/kind with pagination.
DeleteManifest(DeleteManifestRequest) (Optional) Removes manifest/payload based on retention policies.

.NET client wraps these calls and handles retries using Polly policies.

WebService integration (2025-11-05)

  • /api/v1/scans/{id} and /api/v1/reports responses now include a surface block containing:
    • manifestUri cas:// pointer to the Surface manifest JSON.
    • manifestDigest canonical SHA-256 over the manifest payload.
    • manifest.artifacts[] deterministic list with kind, uri, digest, mediaType, format, and optional view. URIs reuse the ArtifactObjectKeyBuilder semantics (cas://{bucket}/{rootPrefix}/images/...).
  • This allows UI/CLI consumers to fetch manifests or artefacts without additional Surface.FS round-trips.

4. Library Responsibilities

Surface.FS library for .NET hosts provides:

  • ISurfaceManifestWriter / ISurfaceManifestReader interfaces.
  • Content-addressed path builder (SurfacePathBuilder).
  • Tenant namespace isolation and bucket configuration (via Surface.Env).
  • Local cache abstraction ISurfaceCache with default FileSurfaceCache implementation (uses Surface:Cache:Root / SCANNER_SURFACE_CACHE_ROOT, enforces quotas, serialises writes with per-key semaphores).
  • SurfaceCacheKey helper that normalises cache entries as {namespace}/{tenant}/{sha256}. EntryTrace graphs use the entrytrace.graph namespace so Worker/WebService/CLI can share cached results deterministically.
  • Metrics: surface_manifest_put_seconds, surface_manifest_cache_hit_total, etc.

5. Retention & Eviction

  • Manifests include optional expiresAt; Worker defaults to 30 days for SBOM fragments, 7 days for entry traces.
  • Background job SurfaceCacheMaintenanceService evicts local cache entries exceeding quota, oldest-first.
  • Object storage retention policies are managed by DevOps; library exposes metrics but does not auto-delete unless instructed.

6. Offline Kit Handling

Offline kits include:

offline/surface/
  manifests/
    tenants/<tenant>/<kind>/<digest>.json
  payloads/
    tenants/<tenant>/<kind>/<digest>.json.zst
  manifest-index.json

Import script calls PutManifest for each manifest, verifying digests. This enables Zastava and Scheduler running offline to consume cached data without re-scanning.

6.1 EntryTrace Cache Usage

Scanner.Worker serialises EntryTrace graphs into Surface.FS using SurfaceCacheKey(namespace: "entrytrace.graph", tenant, sha256(options|env|entrypoint)). At runtime the worker checks the cache before invoking analyzers; cache hits bypass parsing and feed the result store/attestor pipeline directly. The same namespace is consumed by WebService and CLI to retrieve cached graphs for reporting.

6.2 BuildX generator path

StellaOps.Scanner.Sbomer.BuildXPlugin reuses the same CAS layout via the --surface-* descriptor flags (or STELLAOPS_SURFACE_* env vars). When layer fragment JSON, EntryTrace graph JSON, or NDJSON files are supplied, the plug-in writes them under scanner/surface/** within the configured CAS root and emits a manifest pointer so Scanner.WebService can pick up the artefacts without re-scanning. The Surface manifest JSON can also be copied to an arbitrary path via --surface-manifest-output for CI artefacts/offline kits.

7. Security & Tenancy

  • Tenant ID is mandatory; Surface.Validation enforces match with Authority token.
  • Manifests/payloads stored in tenant-specific prefixes to prevent leakage.
  • Optional manifest signing (future) will use Surface.Secrets to load signing keys.
  • TLS enforced between hosts and Surface.FS endpoint; certificate pins configured via Surface.Env.

8. Observability

  • Logs include manifest SHA, tenant, kind, and cache namespace; payload paths are truncated for brevity.
  • Prometheus metrics (emitted by Scanner.Worker) now include:
    • scanner_worker_surface_manifests_published_total, scanner_worker_surface_manifests_failed_total, scanner_worker_surface_manifests_skipped_total with labels {queue, job_kind, surface_result, reason?, surface_payload_count}.
    • scanner_worker_surface_payload_persisted_total with {surface_kind} to track cache churn (entrytrace.graph, entrytrace.ndjson, layer.fragments, …).
    • scanner_worker_surface_manifest_publish_duration_ms histogram for end-to-end persistence latency.
  • Grafana dashboard JSON: docs/modules/scanner/operations/surface-worker-grafana-dashboard.json (panels for publish outcomes, latency, per-kind cache rate, and failure reasons). Import alongside the analyzer dashboard and point it to the Scanner Prometheus datasource.
  • Tracing spans: surface.fs.put, surface.fs.get, surface.fs.cache.

9. Testing Strategy

  • Unit tests for path builder, manifest serializer, and local cache eviction.
  • Integration tests using embedded RustFS or MinIO container to validate API interactions.
  • Offline kit tests verifying export/import cycle round-trips manifests and payloads.

10. Future Enhancements

  • Manifest signing (DSSE) to support tamper detection in hostile environments.
  • Differential manifests to optimise large SBOM updates.
  • Cross-region replication for multi-site deployments.

11. References

  • Surface.Env Design (docs/modules/scanner/design/surface-env.md)
  • Surface.Secrets Design (docs/modules/scanner/design/surface-secrets.md)
  • Surface.Validation Design (docs/modules/scanner/design/surface-validation.md)
  • Zastava Deployment Runbook (docs/modules/devops/runbooks/zastava-deployment.md)