Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added `LedgerMetrics` class to record write latency and total events for ledger operations. - Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling. - Introduced `TestSurfaceSecretsScope` for managing environment variables during tests. - Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents. - Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB. - Established MongoDB indexes for efficient querying of events based on provenance and trust. - Added models and JSON parsing logic for DSSE provenance and trust information.
207 lines
14 KiB
Markdown
207 lines
14 KiB
Markdown
# Surface.Env Design (Epic: SURFACE-SHARING)
|
||
|
||
> **Status:** Draft v1.0 — aligns with tasks `SURFACE-ENV-01..05`, `SCANNER-ENV-01..03`, `ZASTAVA-ENV-01..02`, `OPS-ENV-01`.
|
||
>
|
||
> **Audience:** Scanner Worker/WebService engineers, Zastava engineers, DevOps/Ops teams.
|
||
|
||
## 1. Goals
|
||
|
||
Surface.Env centralises configuration discovery for every component that touches the shared Scanner “surface” (cache, manifests, secrets). The library replaces ad-hoc environment lookups with a deterministic, validated contract that:
|
||
|
||
1. Works identically across Scanner Worker, Scanner WebService, BuildX plug-ins, Zastava Observer/Webhook, and future consumers (Scheduler planners, CLI runners).
|
||
2. Supports both connected and air-gapped deployments with clear defaults.
|
||
3. Records configuration intent (tenant isolation, cache limits, TLS, feature flags) so Surface.Validation can enforce preconditions before any work executes.
|
||
|
||
## 2. Architecture Overview
|
||
|
||
```
|
||
+-----------------------+
|
||
| Host (Worker/WebSvc) |
|
||
| - IConfiguration |
|
||
| - ILogger |
|
||
| |
|
||
| +-----------------+ |
|
||
| | SurfaceEnv | | loads env vars / config file
|
||
| | - Provider |--+------------------------------+
|
||
| | - Validators | |
|
||
| +-----------------+ |
|
||
| | |
|
||
| | IResolvedSurfaceConfiguration |
|
||
| v v
|
||
| Surface.FS / Surface.Secrets / Surface.Validation consumers
|
||
+-------------------------------------------------------------
|
||
```
|
||
|
||
Surface.Env exposes `ISurfaceEnvironment` which returns an immutable `SurfaceEnvironmentSettings` record. Hosts call `SurfaceEnvBuilder.Build()` during startup, passing optional configuration overrides (for example, Helm chart values). The builder resolves environment variables, applies defaults, and executes Surface.Validation rules before handing settings to downstream services.
|
||
|
||
## 3. Configuration Schema
|
||
|
||
### 3.1 Common keys
|
||
|
||
| Variable | Description | Default | Notes |
|
||
|----------|-------------|---------|-------|
|
||
| `SCANNER_SURFACE_FS_ENDPOINT` | Base URI for Surface.FS / RustFS / S3-compatible store. | _required_ | Throws `SurfaceEnvironmentException` when `RequireSurfaceEndpoint = true`. When disabled (tests), builder falls back to `https://surface.invalid` so validation can fail fast. Also binds `Surface:Fs:Endpoint` from `IConfiguration`. |
|
||
| `SCANNER_SURFACE_FS_BUCKET` | Bucket/container used for manifests and artefacts. | `surface-cache` | Must be unique per tenant; validators enforce non-empty value. |
|
||
| `SCANNER_SURFACE_FS_REGION` | Optional region for S3-compatible stores. | `null` | Needed only when the backing store requires it (AWS/GCS). |
|
||
| `SCANNER_SURFACE_CACHE_ROOT` | Local directory for warm caches. | `<temp>/stellaops/surface` | Directory is created if missing. Override to `/var/lib/stellaops/surface` (or another fast SSD) in production. |
|
||
| `SCANNER_SURFACE_CACHE_QUOTA_MB` | Soft limit for on-disk cache usage. | `4096` | Enforced range 64–262144 MB; validation emits `SURFACE_ENV_CACHE_QUOTA_INVALID` outside the range. |
|
||
| `SCANNER_SURFACE_PREFETCH_ENABLED` | Enables manifest prefetch threads. | `false` | Workers honour this before analyzer execution. |
|
||
| `SCANNER_SURFACE_TENANT` | Tenant namespace used by cache + secret resolvers. | `TenantResolver(...)` or `"default"` | Default resolver may pull from Authority claims; you can override via env for multi-tenant pools. |
|
||
| `SCANNER_SURFACE_FEATURES` | Comma-separated feature switches. | `""` | Compared against `SurfaceEnvironmentOptions.KnownFeatureFlags`; unknown flags raise warnings. |
|
||
| `SCANNER_SURFACE_TLS_CERT_PATH` | Path to PEM/PKCS#12 file for client auth. | `null` | When present, `SurfaceEnvironmentBuilder` loads the certificate into `SurfaceTlsConfiguration`. |
|
||
| `SCANNER_SURFACE_TLS_KEY_PATH` | Optional private-key path when cert/key are stored separately. | `null` | Stored in `SurfaceTlsConfiguration` for hosts that need to hydrate the key themselves. |
|
||
|
||
### 3.2 Secrets provider keys
|
||
|
||
| Variable | Description | Notes |
|
||
|----------|-------------|-------|
|
||
| `SCANNER_SURFACE_SECRETS_PROVIDER` | Provider ID (`kubernetes`, `file`, `inline`, future back-ends). | Defaults to `kubernetes`; validators reject unknown values via `SURFACE_SECRET_PROVIDER_UNKNOWN`. |
|
||
| `SCANNER_SURFACE_SECRETS_ROOT` | Path or base namespace for the provider. | Required for the `file` provider (e.g., `/etc/stellaops/secrets`). |
|
||
| `SCANNER_SURFACE_SECRETS_NAMESPACE` | Kubernetes namespace used by the secrets provider. | Mandatory when `provider = kubernetes`. |
|
||
| `SCANNER_SURFACE_SECRETS_FALLBACK_PROVIDER` | Optional secondary provider ID. | Enables tiered lookups (e.g., `kubernetes` → `inline`) without changing code. |
|
||
| `SCANNER_SURFACE_SECRETS_ALLOW_INLINE` | Allows returning inline secrets (useful for tests). | Defaults to `false`; Production deployments should keep this disabled. |
|
||
| `SCANNER_SURFACE_SECRETS_TENANT` | Tenant override for secret lookups. | Defaults to `SCANNER_SURFACE_TENANT` or the tenant resolver result. |
|
||
|
||
### 3.3 Component-specific prefixes
|
||
|
||
`SurfaceEnvironmentOptions.Prefixes` controls the order in which suffixes are probed. Every suffix listed above is combined with each prefix (e.g., `SCANNER_SURFACE_FS_ENDPOINT`, `ZASTAVA_SURFACE_FS_ENDPOINT`) and finally the bare suffix (`SURFACE_FS_ENDPOINT`). Configure prefixes per host so local overrides win but global scanner defaults remain available:
|
||
|
||
| Component | Suggested prefixes (first match wins) | Notes |
|
||
|-----------|---------------------------------------|-------|
|
||
| Scanner.Worker / WebService | `SCANNER` | Default – already added by `AddSurfaceEnvironment`. |
|
||
| Zastava Observer/Webhook (planned) | `ZASTAVA`, `SCANNER` | Call `options.AddPrefix("ZASTAVA")` before relying on `ZASTAVA_*` overrides. |
|
||
| Future CLI / BuildX plug-ins | `CLI`, `SCANNER` | Allows per-user overrides without breaking shared env files. |
|
||
|
||
This approach means operators can define a single env file (SCANNER_*) and only override the handful of settings that diverge for a specific component by introducing an additional prefix.
|
||
|
||
### 3.4 Configuration precedence
|
||
|
||
The builder resolves every suffix using the following precedence:
|
||
|
||
1. Environment variables using the configured prefixes (e.g., `ZASTAVA_SURFACE_FS_ENDPOINT`, then `SCANNER_SURFACE_FS_ENDPOINT`, then the bare `SURFACE_FS_ENDPOINT`).
|
||
2. Configuration values under the `Surface:*` section (for example `Surface:Fs:Endpoint`, `Surface:Cache:Root` in `appsettings.json` or Helm values).
|
||
3. Hard-coded defaults baked into `SurfaceEnvironmentBuilder` (temporary directory, `surface-cache` bucket, etc.).
|
||
|
||
`SurfaceEnvironmentOptions.RequireSurfaceEndpoint` controls whether a missing endpoint results in an exception (default: `true`). Other values fall back to the default listed in § 3.1/3.2 and are further validated by the Surface.Validation pipeline.
|
||
|
||
## 4. API Surface
|
||
|
||
```csharp
|
||
public interface ISurfaceEnvironment
|
||
{
|
||
SurfaceEnvironmentSettings Settings { get; }
|
||
IReadOnlyDictionary<string, string> RawVariables { get; }
|
||
}
|
||
|
||
public sealed record SurfaceEnvironmentSettings(
|
||
Uri SurfaceFsEndpoint,
|
||
string SurfaceFsBucket,
|
||
string? SurfaceFsRegion,
|
||
DirectoryInfo CacheRoot,
|
||
int CacheQuotaMegabytes,
|
||
bool PrefetchEnabled,
|
||
IReadOnlyCollection<string> FeatureFlags,
|
||
SurfaceSecretsConfiguration Secrets,
|
||
string Tenant,
|
||
SurfaceTlsConfiguration Tls)
|
||
{
|
||
public DateTimeOffset CreatedAtUtc { get; init; }
|
||
}
|
||
|
||
public sealed record SurfaceSecretsConfiguration(
|
||
string Provider,
|
||
string Tenant,
|
||
string? Root,
|
||
string? Namespace,
|
||
string? FallbackProvider,
|
||
bool AllowInline);
|
||
|
||
public sealed record SurfaceTlsConfiguration(
|
||
string? CertificatePath,
|
||
string? PrivateKeyPath,
|
||
X509Certificate2Collection? ClientCertificates);
|
||
```
|
||
|
||
`ISurfaceEnvironment.RawVariables` captures the exact env/config keys that produced the snapshot so operators can export them in diagnostics bundles.
|
||
|
||
`SurfaceEnvironmentOptions` configures how the snapshot is built:
|
||
|
||
* `ComponentName` – used in logs/validation output.
|
||
* `Prefixes` – ordered list of env prefixes (see § 3.3). Defaults to `["SCANNER"]`.
|
||
* `RequireSurfaceEndpoint` – throw when no endpoint is provided (default `true`).
|
||
* `TenantResolver` – delegate invoked when `SCANNER_SURFACE_TENANT` is absent.
|
||
* `KnownFeatureFlags` – recognised feature switches; unexpected values raise warnings.
|
||
|
||
Example registration:
|
||
|
||
```csharp
|
||
builder.Services.AddSurfaceEnvironment(options =>
|
||
{
|
||
options.ComponentName = "Scanner.Worker";
|
||
options.AddPrefix("ZASTAVA"); // optional future override
|
||
options.KnownFeatureFlags.Add("validation");
|
||
options.TenantResolver = sp => sp.GetRequiredService<ITenantContext>().TenantId;
|
||
});
|
||
```
|
||
|
||
Consumers access `ISurfaceEnvironment.Settings` and pass the record into Surface.FS, Surface.Secrets, cache, and validation helpers. The interface memoises results so repeated access is cheap.
|
||
|
||
## 5. Validation
|
||
|
||
`SurfaceEnvironmentBuilder` only throws `SurfaceEnvironmentException` for malformed inputs (non-integer quota, invalid URI, missing required variable when `RequireSurfaceEndpoint = true`). The richer validation pipeline lives in `StellaOps.Scanner.Surface.Validation` and runs via `services.AddSurfaceValidation()`:
|
||
|
||
1. **SurfaceEndpointValidator** – checks for a non-placeholder endpoint and bucket (`SURFACE_ENV_MISSING_ENDPOINT`, `SURFACE_FS_BUCKET_MISSING`).
|
||
2. **SurfaceCacheValidator** – verifies the cache directory exists/is writable and that the quota is positive (`SURFACE_ENV_CACHE_DIR_UNWRITABLE`, `SURFACE_ENV_CACHE_QUOTA_INVALID`).
|
||
3. **SurfaceSecretsValidator** – validates provider names, required namespace/root fields, and tenant presence (`SURFACE_SECRET_PROVIDER_UNKNOWN`, `SURFACE_SECRET_CONFIGURATION_MISSING`, `SURFACE_ENV_TENANT_MISSING`).
|
||
|
||
Validators emit `SurfaceValidationIssue` instances with codes defined in `SurfaceValidationIssueCodes`. `LoggingSurfaceValidationReporter` writes structured log entries (Info/Warning/Error) using the component name, issue code, and remediation hint. Hosts fail startup if any issue has `Error` severity; warnings allow startup but surface actionable hints.
|
||
|
||
## 6. Integration Guidance
|
||
|
||
- **Scanner Worker**: register `AddSurfaceEnvironment`, `AddSurfaceValidation`, `AddSurfaceFileCache`, and `AddSurfaceSecrets` before analyzer/services (see `src/Scanner/StellaOps.Scanner.Worker/Program.cs`). `SurfaceCacheOptionsConfigurator` already binds the cache root from `ISurfaceEnvironment`.
|
||
- **Scanner WebService**: identical wiring, plus `SurfacePointerService`/`ScannerSurfaceSecretConfigurator` reuse the resolved settings (`Program.cs` demonstrates the pattern).
|
||
- **Zastava Observer/Webhook**: will reuse the same helper once the service adds `AddSurfaceEnvironment(options => options.AddPrefix("ZASTAVA"))` so per-component overrides function without diverging defaults.
|
||
- **Scheduler / CLI / BuildX (future)**: treat `ISurfaceEnvironment` as read-only input; secret lookup, cache plumbing, and validation happen before any queue/enqueue work.
|
||
|
||
Readiness probes should invoke `ISurfaceValidatorRunner` (registered by `AddSurfaceValidation`) and fail the endpoint when any issue is returned. The Scanner Worker/WebService hosted services already run the validators on startup; other consumers should follow the same pattern.
|
||
|
||
### 6.1 Validation output
|
||
|
||
`LoggingSurfaceValidationReporter` produces log entries that include:
|
||
|
||
```
|
||
Surface validation issue for component Scanner.Worker: SURFACE_ENV_MISSING_ENDPOINT - Surface FS endpoint is missing or invalid. Hint: Set SCANNER_SURFACE_FS_ENDPOINT to the RustFS/S3 endpoint.
|
||
```
|
||
|
||
Treat `SurfaceValidationIssueCodes.*` with severity `Error` as hard blockers (readiness must fail). `Warning` entries flag configuration drift (for example, missing namespaces) but allow startup so staging/offline runs can proceed. The codes appear in both the structured log state and the reporter payload, making it easy to alert on them.
|
||
|
||
## 7. Security & Observability
|
||
|
||
- Surface.Env never logs raw values; only suffix names and issue codes appear in logs. `RawVariables` is intended for diagnostics bundles and should be treated as sensitive metadata.
|
||
- TLS certificates are loaded into memory and not re-serialised; only the configured paths are exposed to downstream services.
|
||
- To emit metrics, register a custom `ISurfaceValidationReporter` (e.g., wrapping Prometheus counters) in addition to the logging reporter.
|
||
|
||
## 8. Offline & Air-Gap Support
|
||
|
||
- Defaults assume no public network access; point `SCANNER_SURFACE_FS_ENDPOINT` at an internal RustFS/S3 mirror.
|
||
- Offline bundles must capture an env file (Ops track this under the Offline Kit tasks) so operators can seed `SCANNER_*` values before first boot.
|
||
- Keep `docs/modules/devops/runbooks/zastava-deployment.md` in sync so Zastava deployments reuse the same env contract.
|
||
|
||
## 9. Testing Strategy
|
||
|
||
- Unit tests for each resolver/validator.
|
||
- Integration tests for Worker & Observer verifying that missing configuration causes deterministic failures.
|
||
- Golden tests for configuration precedence (component overrides, defaults).
|
||
|
||
## 10. Open Questions / Future Work
|
||
|
||
- Dynamic refresh of environment (watch ConfigMap) is out of scope for v1.
|
||
- Evaluate adding support for environment discovery via `IConfiguration` only (no env vars) for Windows service deployments.
|
||
|
||
## 11. References
|
||
|
||
- Surface.FS Design (`docs/modules/scanner/design/surface-fs.md`)
|
||
- Surface.Secrets Design (`docs/modules/scanner/design/surface-secrets.md`)
|
||
- Surface.Validation Design (`docs/modules/scanner/design/surface-validation.md`)
|
||
- AirGap mode overview (`docs/airgap/airgap-mode.md`)
|