Implement Advisory Canonicalization and Backfill Migration
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added AdvisoryCanonicalizer for canonicalizing advisory identifiers. - Created EnsureAdvisoryCanonicalKeyBackfillMigration to populate advisory_key and links in advisory_raw documents. - Introduced FileSurfaceManifestStore for managing surface manifests with file system backing. - Developed ISurfaceManifestReader and ISurfaceManifestWriter interfaces for reading and writing manifests. - Implemented SurfaceManifestPathBuilder for constructing paths and URIs for surface manifests. - Added tests for FileSurfaceManifestStore to ensure correct functionality and deterministic behavior. - Updated documentation for new features and migration steps.
This commit is contained in:
@@ -1,8 +1,9 @@
|
||||
# Surface.FS Design (Epic: SURFACE-SHARING)
|
||||
|
||||
> **Status:** Draft v1.0 — aligns with tasks `SURFACE-FS-01..06`, `SCANNER-SURFACE-01..05`, `ZASTAVA-SURFACE-01..02`, `SCHED-SURFACE-01`, `OPS-SECRETS-01..02`.
|
||||
> **Status:** Draft v1.1 — aligns with tasks `SURFACE-FS-01..06`, `SCANNER-SURFACE-01..05`, `ZASTAVA-SURFACE-01..02`, `SCHED-SURFACE-01`, `OPS-SECRETS-01..02`.
|
||||
>
|
||||
> **Audience:** Scanner Worker/WebService, Zastava, Scheduler, DevOps.
|
||||
> **Component map:** See [Scanner architecture — §1 System landscape](../architecture.md#1-system-landscape) for end-to-end placement.
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
@@ -26,45 +27,61 @@ Manifests describe the artefact metadata and storage pointers. They are stored i
|
||||
{
|
||||
"schema": "stellaops.surface.manifest@1",
|
||||
"tenant": "acme",
|
||||
"kind": "layer-entry-trace",
|
||||
"digest": "sha256:ab12...",
|
||||
"createdAt": "2025-10-29T12:00:00Z",
|
||||
"expiresAt": "2025-11-05T12:00:00Z",
|
||||
"imageDigest": "sha256:cafe...",
|
||||
"scanId": "scan-1234",
|
||||
"generatedAt": "2025-10-29T12:00:00Z",
|
||||
"source": {
|
||||
"scannerBuild": "stellaops/scanner@sha256:deadbeef",
|
||||
"imageDigest": "sha256:cafe...",
|
||||
"scanId": "scan-1234"
|
||||
"component": "scanner.worker",
|
||||
"version": "2025.10.0",
|
||||
"workerInstance": "scanner-worker-1",
|
||||
"attempt": 1
|
||||
},
|
||||
"storage": {
|
||||
"bucket": "surface-cache",
|
||||
"objectKey": "tenants/acme/layer-entry-trace/sha256/ab/12/.../payload.json.zst",
|
||||
"sizeBytes": 524288,
|
||||
"contentType": "application/json+zstd"
|
||||
},
|
||||
"integrity": {
|
||||
"hash": "sha256:ab12...",
|
||||
"signature": null
|
||||
}
|
||||
"artifacts": [
|
||||
{
|
||||
"kind": "entrytrace.graph",
|
||||
"uri": "cas://surface-cache/manifests/acme/ab/cd/abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789.json",
|
||||
"digest": "sha256:abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789",
|
||||
"mediaType": "application/vnd.stellaops.entrytrace+json",
|
||||
"format": "json",
|
||||
"sizeBytes": 524288,
|
||||
"view": "runtime",
|
||||
"storage": {
|
||||
"bucket": "surface-cache",
|
||||
"objectKey": "payloads/acme/entrytrace/sha256/ab/cd/abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789.ndjson.zst",
|
||||
"sizeBytes": 524288,
|
||||
"contentType": "application/x-ndjson+zstd"
|
||||
},
|
||||
"metadata": {
|
||||
"entrypoint": "/usr/bin/java",
|
||||
"surfaceVersion": "1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Manifest URIs follow the deterministic pattern:
|
||||
|
||||
```
|
||||
cas://{bucket}/{prefix}/{tenant}/{digest[0..1]}/{digest[2..3]}/{digest}.json
|
||||
```
|
||||
|
||||
The hex portion of the manifest digest is split into two directory levels to avoid hot directories. The same layout is mirrored on disk by the default `FileSurfaceManifestStore`, which keeps offline bundle sync trivial (copy the `manifests/` tree verbatim).
|
||||
|
||||
### 2.3 Payload Storage
|
||||
|
||||
Large payloads (SBOM fragments, entry traces, runtime events) live in the same object store as manifests (RustFS/S3). Manifests record relative paths so offline bundles can copy both manifest and payload without modification.
|
||||
|
||||
## 3. APIs
|
||||
|
||||
Surface.FS exposes a gRPC/HTTP API consumed by .NET clients:
|
||||
Surface.FS exposes .NET-first abstractions that hosts consume via DI:
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `PutManifest(PutManifestRequest)` | Stores manifest + optional payload. Idempotent via `digest`. |
|
||||
| `GetManifest(GetManifestRequest)` | Returns manifest metadata; 404 if missing. |
|
||||
| `GetPayload(GetPayloadRequest)` | Streams payload bytes (optionally decompressing). |
|
||||
| `ListManifests(ListManifestRequest)` | Enumerates manifests for tenant/kind with pagination. |
|
||||
| `DeleteManifest(DeleteManifestRequest)` | (Optional) Removes manifest/payload based on retention policies. |
|
||||
- `ISurfaceManifestWriter.PublishAsync(document)` – normalises artefact lists, computes the canonical SHA-256 digest, persists the manifest via the configured store, and returns a `SurfaceManifestPublishResult` containing the digest, canonical URI, and the normalised document.
|
||||
- `ISurfaceManifestReader.TryGetByUriAsync(uri)` – resolves a manifest pointer (e.g. `cas://surface-cache/manifests/...`) back into a `SurfaceManifestDocument`.
|
||||
- `ISurfaceManifestReader.TryGetByDigestAsync(digest)` – looks up a manifest by digest, scanning tenant prefixes when necessary (used by Offline Kit importers).
|
||||
- `ISurfaceCache` (`GetOrCreateAsync`, `TryGetAsync`, `SetAsync`) – lightweight content-addressable cache for hot artefacts (layer fragments, entry trace outputs) hosted on local disk.
|
||||
|
||||
.NET client wraps these calls and handles retries using Polly policies.
|
||||
All components honour configuration bound from `Surface:Cache` and `Surface:Manifest` (or environment mirrors like `SCANNER_SURFACE_CACHE_ROOT`). `SurfaceManifestStoreOptions` controls the URI scheme/bucket/prefix and allows overriding the manifest directory while still defaulting to `<cacheRoot>/manifests`.
|
||||
|
||||
### WebService integration (2025-11-05)
|
||||
|
||||
@@ -78,16 +95,16 @@ Surface.FS exposes a gRPC/HTTP API consumed by .NET clients:
|
||||
|
||||
Surface.FS library for .NET hosts provides:
|
||||
|
||||
- `ISurfaceManifestWriter` / `ISurfaceManifestReader` interfaces.
|
||||
- Content-addressed path builder (`SurfacePathBuilder`).
|
||||
- Tenant namespace isolation and bucket configuration (via Surface.Env).
|
||||
- Local cache abstraction `ISurfaceCache` with default `FileSurfaceCache` implementation (uses `Surface:Cache:Root` / `SCANNER_SURFACE_CACHE_ROOT`, enforces quotas, serialises writes with per-key semaphores).
|
||||
- `ISurfaceManifestWriter` / `ISurfaceManifestReader` with the default `FileSurfaceManifestStore` implementation (single-writer semaphore, digest reuse, optional overwrite warning).
|
||||
- Deterministic pointer builder (`SurfaceManifestPathBuilder`) and options (`SurfaceManifestStoreOptions`, `SurfaceCacheOptions`) that align with `Surface.Env` configuration.
|
||||
- Local cache abstraction `ISurfaceCache` with default `FileSurfaceCache` implementation (uses `Surface:Cache:Root` / `SCANNER_SURFACE_CACHE_ROOT`, enforces per-key semaphores, stores bytes verbatim).
|
||||
- `SurfaceCacheKey` helper that normalises cache entries as `{namespace}/{tenant}/{sha256}`. EntryTrace graphs use the `entrytrace.graph` namespace so Worker/WebService/CLI can share cached results deterministically.
|
||||
- Metrics: `surface_manifest_put_seconds`, `surface_manifest_cache_hit_total`, etc.
|
||||
- JSON serialiser (`SurfaceCacheJsonSerializer`) that applies camelCase naming, ignores nulls, and uses a stable encoder for reproducible hashing.
|
||||
- Metrics: `surface_manifest_published_total`, `surface_manifest_cache_hit_total`, plus host-specific counters wired via Scanner Worker instrumentation.
|
||||
|
||||
## 5. Retention & Eviction
|
||||
|
||||
- Manifests include optional `expiresAt`; Worker defaults to 30 days for SBOM fragments, 7 days for entry traces.
|
||||
- Manifests capture `generatedAt`; retention windows (30 days for SBOM fragments, 7 days for entry traces) are enforced by job configuration and object-store lifecycle policies. An `expiresAt` field is reserved for future use when automated eviction is introduced.
|
||||
- Background job `SurfaceCacheMaintenanceService` evicts local cache entries exceeding quota, oldest-first.
|
||||
- Object storage retention policies are managed by DevOps; library exposes metrics but does not auto-delete unless instructed.
|
||||
|
||||
@@ -98,13 +115,13 @@ Offline kits include:
|
||||
```
|
||||
offline/surface/
|
||||
manifests/
|
||||
tenants/<tenant>/<kind>/<digest>.json
|
||||
<tenant>/<digest[0..1]>/<digest[2..3]>/<digest>.json
|
||||
payloads/
|
||||
tenants/<tenant>/<kind>/<digest>.json.zst
|
||||
<tenant>/<kind>/<digest[0..1]>/<digest[2..3]>/<digest>.json.zst
|
||||
manifest-index.json
|
||||
```
|
||||
|
||||
Import script calls `PutManifest` for each manifest, verifying digests. This enables Zastava and Scheduler running offline to consume cached data without re-scanning.
|
||||
Import script uses `ISurfaceManifestWriter.PublishAsync` for each manifest after verifying the embedded digest, keeping Offline Kit replays identical to online flows. This enables Zastava and Scheduler running offline to consume cached data without re-scanning.
|
||||
|
||||
### 6.1 EntryTrace Cache Usage
|
||||
|
||||
|
||||
Reference in New Issue
Block a user