Implement Advisory Canonicalization and Backfill Migration
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added AdvisoryCanonicalizer for canonicalizing advisory identifiers. - Created EnsureAdvisoryCanonicalKeyBackfillMigration to populate advisory_key and links in advisory_raw documents. - Introduced FileSurfaceManifestStore for managing surface manifests with file system backing. - Developed ISurfaceManifestReader and ISurfaceManifestWriter interfaces for reading and writing manifests. - Implemented SurfaceManifestPathBuilder for constructing paths and URIs for surface manifests. - Added tests for FileSurfaceManifestStore to ensure correct functionality and deterministic behavior. - Updated documentation for new features and migration steps.
This commit is contained in:
51
docs/modules/scanner/design/surface-fs-consumers.md
Normal file
51
docs/modules/scanner/design/surface-fs-consumers.md
Normal file
@@ -0,0 +1,51 @@
|
||||
# Surface.FS Consumer Integration Guide (Scheduler & Zastava)
|
||||
|
||||
> **Updated:** 2025-11-07
|
||||
> **Audience:** Scheduler Worker Guild • Zastava Observer Guild • Surface FS Guild
|
||||
> **Depends on:** SURFACE-FS-02 (`FileSurfaceManifestStore`), Surface.Env/Surface.Secrets libraries.
|
||||
|
||||
This note captures the minimum wiring required for downstream services now that `FileSurfaceManifestStore` and the manifest reader/writer abstractions have landed.
|
||||
|
||||
## 1. Shared prerequisites
|
||||
|
||||
- Reference `StellaOps.Scanner.Surface.FS` (net10.0) and call:
|
||||
```csharp
|
||||
services
|
||||
.AddSurfaceFileCache()
|
||||
.AddSurfaceManifestStore();
|
||||
```
|
||||
This binds `Surface:Cache` and `Surface:Manifest` (or `SCANNER_SURFACE_*` overrides).
|
||||
- Pull runtime settings via `ISurfaceEnvironment` to ensure tenants/endpoints line up with Scanner.
|
||||
- Cache root (`Surface:Cache:Root`) must be writable; manifests fall back to `<Root>/manifests` unless explicitly overridden with `Surface:Manifest:RootDirectory`.
|
||||
|
||||
## 2. Manifest reader usage
|
||||
|
||||
```csharp
|
||||
var reader = serviceProvider.GetRequiredService<ISurfaceManifestReader>();
|
||||
var manifest = await reader.TryGetByUriAsync(surfaceUri, cancellationToken);
|
||||
```
|
||||
|
||||
- Accept `cas://{bucket}/{prefix}/{tenant}/{hh}/{tt}/{digest}.json` pointers.
|
||||
- On cache miss, return `null`—callers should fall back to existing recompute paths.
|
||||
- All timestamps are stored in canonical UTC, and metadata dictionaries are alphabetically sorted to keep digests deterministic.
|
||||
|
||||
## 3. Scheduler worker checklist (`SCHED-SURFACE-02`)
|
||||
|
||||
1. Prefetch manifests during planning so reruns can skip redundant layers.
|
||||
2. Persist `{manifestUri, manifestDigest}` alongside run plans for traceability.
|
||||
3. Emit telemetry counters: `scheduler_surface_manifest_prefetch_total{result=hit|miss}`.
|
||||
4. Update `docs/SCHED-WORKER-16-201-PLANNER.md` with the new prefetch flow.
|
||||
|
||||
## 4. Zastava observer checklist (`ZASTAVA-SURFACE-02`)
|
||||
|
||||
1. Resolve manifest pointer from runtime drift events (`entrytrace.graph`, `layer.fragments` kinds).
|
||||
2. Enrich drift diagnostics with `manifestDigest` and `Artifacts[n].metadata`.
|
||||
3. Add failure metric `zastava_surface_manifest_failures_total{reason=not_found|fetch_error}`.
|
||||
4. Expand observer runbook (`docs/modules/zastava/operations/drift.md`) with Surface manifest troubleshooting.
|
||||
|
||||
## 5. Testing guidance
|
||||
|
||||
- Unit-test manifest prefetch/adoption with local `FileSurfaceManifestStore`; use temp directories for isolation.
|
||||
- For integration environments, smoke-test by pointing to the same `Surface:Manifest:RootDirectory` used by Scanner Worker and verifying pointer fetch before scan jobs execute.
|
||||
|
||||
Coordinate status updates in the relevant `TASKS.md` entries and `docs/implplan/SPRINT_130_scanner_surface.md` once each guild completes its part. If you discover additional shared requirements, extend this guide so future consumers (CLI, Orchestrator) can reuse the flow.
|
||||
@@ -1,8 +1,9 @@
|
||||
# Surface.FS Design (Epic: SURFACE-SHARING)
|
||||
|
||||
> **Status:** Draft v1.0 — aligns with tasks `SURFACE-FS-01..06`, `SCANNER-SURFACE-01..05`, `ZASTAVA-SURFACE-01..02`, `SCHED-SURFACE-01`, `OPS-SECRETS-01..02`.
|
||||
> **Status:** Draft v1.1 — aligns with tasks `SURFACE-FS-01..06`, `SCANNER-SURFACE-01..05`, `ZASTAVA-SURFACE-01..02`, `SCHED-SURFACE-01`, `OPS-SECRETS-01..02`.
|
||||
>
|
||||
> **Audience:** Scanner Worker/WebService, Zastava, Scheduler, DevOps.
|
||||
> **Component map:** See [Scanner architecture — §1 System landscape](../architecture.md#1-system-landscape) for end-to-end placement.
|
||||
|
||||
## 1. Purpose
|
||||
|
||||
@@ -26,45 +27,61 @@ Manifests describe the artefact metadata and storage pointers. They are stored i
|
||||
{
|
||||
"schema": "stellaops.surface.manifest@1",
|
||||
"tenant": "acme",
|
||||
"kind": "layer-entry-trace",
|
||||
"digest": "sha256:ab12...",
|
||||
"createdAt": "2025-10-29T12:00:00Z",
|
||||
"expiresAt": "2025-11-05T12:00:00Z",
|
||||
"imageDigest": "sha256:cafe...",
|
||||
"scanId": "scan-1234",
|
||||
"generatedAt": "2025-10-29T12:00:00Z",
|
||||
"source": {
|
||||
"scannerBuild": "stellaops/scanner@sha256:deadbeef",
|
||||
"imageDigest": "sha256:cafe...",
|
||||
"scanId": "scan-1234"
|
||||
"component": "scanner.worker",
|
||||
"version": "2025.10.0",
|
||||
"workerInstance": "scanner-worker-1",
|
||||
"attempt": 1
|
||||
},
|
||||
"storage": {
|
||||
"bucket": "surface-cache",
|
||||
"objectKey": "tenants/acme/layer-entry-trace/sha256/ab/12/.../payload.json.zst",
|
||||
"sizeBytes": 524288,
|
||||
"contentType": "application/json+zstd"
|
||||
},
|
||||
"integrity": {
|
||||
"hash": "sha256:ab12...",
|
||||
"signature": null
|
||||
}
|
||||
"artifacts": [
|
||||
{
|
||||
"kind": "entrytrace.graph",
|
||||
"uri": "cas://surface-cache/manifests/acme/ab/cd/abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789.json",
|
||||
"digest": "sha256:abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789",
|
||||
"mediaType": "application/vnd.stellaops.entrytrace+json",
|
||||
"format": "json",
|
||||
"sizeBytes": 524288,
|
||||
"view": "runtime",
|
||||
"storage": {
|
||||
"bucket": "surface-cache",
|
||||
"objectKey": "payloads/acme/entrytrace/sha256/ab/cd/abcdef0123456789abcdef0123456789abcdef0123456789abcdef0123456789.ndjson.zst",
|
||||
"sizeBytes": 524288,
|
||||
"contentType": "application/x-ndjson+zstd"
|
||||
},
|
||||
"metadata": {
|
||||
"entrypoint": "/usr/bin/java",
|
||||
"surfaceVersion": "1"
|
||||
}
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
Manifest URIs follow the deterministic pattern:
|
||||
|
||||
```
|
||||
cas://{bucket}/{prefix}/{tenant}/{digest[0..1]}/{digest[2..3]}/{digest}.json
|
||||
```
|
||||
|
||||
The hex portion of the manifest digest is split into two directory levels to avoid hot directories. The same layout is mirrored on disk by the default `FileSurfaceManifestStore`, which keeps offline bundle sync trivial (copy the `manifests/` tree verbatim).
|
||||
|
||||
### 2.3 Payload Storage
|
||||
|
||||
Large payloads (SBOM fragments, entry traces, runtime events) live in the same object store as manifests (RustFS/S3). Manifests record relative paths so offline bundles can copy both manifest and payload without modification.
|
||||
|
||||
## 3. APIs
|
||||
|
||||
Surface.FS exposes a gRPC/HTTP API consumed by .NET clients:
|
||||
Surface.FS exposes .NET-first abstractions that hosts consume via DI:
|
||||
|
||||
| Method | Description |
|
||||
|--------|-------------|
|
||||
| `PutManifest(PutManifestRequest)` | Stores manifest + optional payload. Idempotent via `digest`. |
|
||||
| `GetManifest(GetManifestRequest)` | Returns manifest metadata; 404 if missing. |
|
||||
| `GetPayload(GetPayloadRequest)` | Streams payload bytes (optionally decompressing). |
|
||||
| `ListManifests(ListManifestRequest)` | Enumerates manifests for tenant/kind with pagination. |
|
||||
| `DeleteManifest(DeleteManifestRequest)` | (Optional) Removes manifest/payload based on retention policies. |
|
||||
- `ISurfaceManifestWriter.PublishAsync(document)` – normalises artefact lists, computes the canonical SHA-256 digest, persists the manifest via the configured store, and returns a `SurfaceManifestPublishResult` containing the digest, canonical URI, and the normalised document.
|
||||
- `ISurfaceManifestReader.TryGetByUriAsync(uri)` – resolves a manifest pointer (e.g. `cas://surface-cache/manifests/...`) back into a `SurfaceManifestDocument`.
|
||||
- `ISurfaceManifestReader.TryGetByDigestAsync(digest)` – looks up a manifest by digest, scanning tenant prefixes when necessary (used by Offline Kit importers).
|
||||
- `ISurfaceCache` (`GetOrCreateAsync`, `TryGetAsync`, `SetAsync`) – lightweight content-addressable cache for hot artefacts (layer fragments, entry trace outputs) hosted on local disk.
|
||||
|
||||
.NET client wraps these calls and handles retries using Polly policies.
|
||||
All components honour configuration bound from `Surface:Cache` and `Surface:Manifest` (or environment mirrors like `SCANNER_SURFACE_CACHE_ROOT`). `SurfaceManifestStoreOptions` controls the URI scheme/bucket/prefix and allows overriding the manifest directory while still defaulting to `<cacheRoot>/manifests`.
|
||||
|
||||
### WebService integration (2025-11-05)
|
||||
|
||||
@@ -78,16 +95,16 @@ Surface.FS exposes a gRPC/HTTP API consumed by .NET clients:
|
||||
|
||||
Surface.FS library for .NET hosts provides:
|
||||
|
||||
- `ISurfaceManifestWriter` / `ISurfaceManifestReader` interfaces.
|
||||
- Content-addressed path builder (`SurfacePathBuilder`).
|
||||
- Tenant namespace isolation and bucket configuration (via Surface.Env).
|
||||
- Local cache abstraction `ISurfaceCache` with default `FileSurfaceCache` implementation (uses `Surface:Cache:Root` / `SCANNER_SURFACE_CACHE_ROOT`, enforces quotas, serialises writes with per-key semaphores).
|
||||
- `ISurfaceManifestWriter` / `ISurfaceManifestReader` with the default `FileSurfaceManifestStore` implementation (single-writer semaphore, digest reuse, optional overwrite warning).
|
||||
- Deterministic pointer builder (`SurfaceManifestPathBuilder`) and options (`SurfaceManifestStoreOptions`, `SurfaceCacheOptions`) that align with `Surface.Env` configuration.
|
||||
- Local cache abstraction `ISurfaceCache` with default `FileSurfaceCache` implementation (uses `Surface:Cache:Root` / `SCANNER_SURFACE_CACHE_ROOT`, enforces per-key semaphores, stores bytes verbatim).
|
||||
- `SurfaceCacheKey` helper that normalises cache entries as `{namespace}/{tenant}/{sha256}`. EntryTrace graphs use the `entrytrace.graph` namespace so Worker/WebService/CLI can share cached results deterministically.
|
||||
- Metrics: `surface_manifest_put_seconds`, `surface_manifest_cache_hit_total`, etc.
|
||||
- JSON serialiser (`SurfaceCacheJsonSerializer`) that applies camelCase naming, ignores nulls, and uses a stable encoder for reproducible hashing.
|
||||
- Metrics: `surface_manifest_published_total`, `surface_manifest_cache_hit_total`, plus host-specific counters wired via Scanner Worker instrumentation.
|
||||
|
||||
## 5. Retention & Eviction
|
||||
|
||||
- Manifests include optional `expiresAt`; Worker defaults to 30 days for SBOM fragments, 7 days for entry traces.
|
||||
- Manifests capture `generatedAt`; retention windows (30 days for SBOM fragments, 7 days for entry traces) are enforced by job configuration and object-store lifecycle policies. An `expiresAt` field is reserved for future use when automated eviction is introduced.
|
||||
- Background job `SurfaceCacheMaintenanceService` evicts local cache entries exceeding quota, oldest-first.
|
||||
- Object storage retention policies are managed by DevOps; library exposes metrics but does not auto-delete unless instructed.
|
||||
|
||||
@@ -98,13 +115,13 @@ Offline kits include:
|
||||
```
|
||||
offline/surface/
|
||||
manifests/
|
||||
tenants/<tenant>/<kind>/<digest>.json
|
||||
<tenant>/<digest[0..1]>/<digest[2..3]>/<digest>.json
|
||||
payloads/
|
||||
tenants/<tenant>/<kind>/<digest>.json.zst
|
||||
<tenant>/<kind>/<digest[0..1]>/<digest[2..3]>/<digest>.json.zst
|
||||
manifest-index.json
|
||||
```
|
||||
|
||||
Import script calls `PutManifest` for each manifest, verifying digests. This enables Zastava and Scheduler running offline to consume cached data without re-scanning.
|
||||
Import script uses `ISurfaceManifestWriter.PublishAsync` for each manifest after verifying the embedded digest, keeping Offline Kit replays identical to online flows. This enables Zastava and Scheduler running offline to consume cached data without re-scanning.
|
||||
|
||||
### 6.1 EntryTrace Cache Usage
|
||||
|
||||
|
||||
Reference in New Issue
Block a user