Files
git.stella-ops.org/docs/modules/scanner/design/surface-env.md
master 61f963fd52
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Implement ledger metrics for observability and add tests for Ruby packages endpoints
- Added `LedgerMetrics` class to record write latency and total events for ledger operations.
- Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling.
- Introduced `TestSurfaceSecretsScope` for managing environment variables during tests.
- Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents.
- Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB.
- Established MongoDB indexes for efficient querying of events based on provenance and trust.
- Added models and JSON parsing logic for DSSE provenance and trust information.
2025-11-13 09:29:09 +02:00

207 lines
14 KiB
Markdown
Raw Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Surface.Env Design (Epic: SURFACE-SHARING)
> **Status:** Draft v1.0 — aligns with tasks `SURFACE-ENV-01..05`, `SCANNER-ENV-01..03`, `ZASTAVA-ENV-01..02`, `OPS-ENV-01`.
>
> **Audience:** Scanner Worker/WebService engineers, Zastava engineers, DevOps/Ops teams.
## 1. Goals
Surface.Env centralises configuration discovery for every component that touches the shared Scanner “surface” (cache, manifests, secrets). The library replaces ad-hoc environment lookups with a deterministic, validated contract that:
1. Works identically across Scanner Worker, Scanner WebService, BuildX plug-ins, Zastava Observer/Webhook, and future consumers (Scheduler planners, CLI runners).
2. Supports both connected and air-gapped deployments with clear defaults.
3. Records configuration intent (tenant isolation, cache limits, TLS, feature flags) so Surface.Validation can enforce preconditions before any work executes.
## 2. Architecture Overview
```
+-----------------------+
| Host (Worker/WebSvc) |
| - IConfiguration |
| - ILogger |
| |
| +-----------------+ |
| | SurfaceEnv | | loads env vars / config file
| | - Provider |--+------------------------------+
| | - Validators | |
| +-----------------+ |
| | |
| | IResolvedSurfaceConfiguration |
| v v
| Surface.FS / Surface.Secrets / Surface.Validation consumers
+-------------------------------------------------------------
```
Surface.Env exposes `ISurfaceEnvironment` which returns an immutable `SurfaceEnvironmentSettings` record. Hosts call `SurfaceEnvBuilder.Build()` during startup, passing optional configuration overrides (for example, Helm chart values). The builder resolves environment variables, applies defaults, and executes Surface.Validation rules before handing settings to downstream services.
## 3. Configuration Schema
### 3.1 Common keys
| Variable | Description | Default | Notes |
|----------|-------------|---------|-------|
| `SCANNER_SURFACE_FS_ENDPOINT` | Base URI for Surface.FS / RustFS / S3-compatible store. | _required_ | Throws `SurfaceEnvironmentException` when `RequireSurfaceEndpoint = true`. When disabled (tests), builder falls back to `https://surface.invalid` so validation can fail fast. Also binds `Surface:Fs:Endpoint` from `IConfiguration`. |
| `SCANNER_SURFACE_FS_BUCKET` | Bucket/container used for manifests and artefacts. | `surface-cache` | Must be unique per tenant; validators enforce non-empty value. |
| `SCANNER_SURFACE_FS_REGION` | Optional region for S3-compatible stores. | `null` | Needed only when the backing store requires it (AWS/GCS). |
| `SCANNER_SURFACE_CACHE_ROOT` | Local directory for warm caches. | `<temp>/stellaops/surface` | Directory is created if missing. Override to `/var/lib/stellaops/surface` (or another fast SSD) in production. |
| `SCANNER_SURFACE_CACHE_QUOTA_MB` | Soft limit for on-disk cache usage. | `4096` | Enforced range 64262144 MB; validation emits `SURFACE_ENV_CACHE_QUOTA_INVALID` outside the range. |
| `SCANNER_SURFACE_PREFETCH_ENABLED` | Enables manifest prefetch threads. | `false` | Workers honour this before analyzer execution. |
| `SCANNER_SURFACE_TENANT` | Tenant namespace used by cache + secret resolvers. | `TenantResolver(...)` or `"default"` | Default resolver may pull from Authority claims; you can override via env for multi-tenant pools. |
| `SCANNER_SURFACE_FEATURES` | Comma-separated feature switches. | `""` | Compared against `SurfaceEnvironmentOptions.KnownFeatureFlags`; unknown flags raise warnings. |
| `SCANNER_SURFACE_TLS_CERT_PATH` | Path to PEM/PKCS#12 file for client auth. | `null` | When present, `SurfaceEnvironmentBuilder` loads the certificate into `SurfaceTlsConfiguration`. |
| `SCANNER_SURFACE_TLS_KEY_PATH` | Optional private-key path when cert/key are stored separately. | `null` | Stored in `SurfaceTlsConfiguration` for hosts that need to hydrate the key themselves. |
### 3.2 Secrets provider keys
| Variable | Description | Notes |
|----------|-------------|-------|
| `SCANNER_SURFACE_SECRETS_PROVIDER` | Provider ID (`kubernetes`, `file`, `inline`, future back-ends). | Defaults to `kubernetes`; validators reject unknown values via `SURFACE_SECRET_PROVIDER_UNKNOWN`. |
| `SCANNER_SURFACE_SECRETS_ROOT` | Path or base namespace for the provider. | Required for the `file` provider (e.g., `/etc/stellaops/secrets`). |
| `SCANNER_SURFACE_SECRETS_NAMESPACE` | Kubernetes namespace used by the secrets provider. | Mandatory when `provider = kubernetes`. |
| `SCANNER_SURFACE_SECRETS_FALLBACK_PROVIDER` | Optional secondary provider ID. | Enables tiered lookups (e.g., `kubernetes``inline`) without changing code. |
| `SCANNER_SURFACE_SECRETS_ALLOW_INLINE` | Allows returning inline secrets (useful for tests). | Defaults to `false`; Production deployments should keep this disabled. |
| `SCANNER_SURFACE_SECRETS_TENANT` | Tenant override for secret lookups. | Defaults to `SCANNER_SURFACE_TENANT` or the tenant resolver result. |
### 3.3 Component-specific prefixes
`SurfaceEnvironmentOptions.Prefixes` controls the order in which suffixes are probed. Every suffix listed above is combined with each prefix (e.g., `SCANNER_SURFACE_FS_ENDPOINT`, `ZASTAVA_SURFACE_FS_ENDPOINT`) and finally the bare suffix (`SURFACE_FS_ENDPOINT`). Configure prefixes per host so local overrides win but global scanner defaults remain available:
| Component | Suggested prefixes (first match wins) | Notes |
|-----------|---------------------------------------|-------|
| Scanner.Worker / WebService | `SCANNER` | Default already added by `AddSurfaceEnvironment`. |
| Zastava Observer/Webhook (planned) | `ZASTAVA`, `SCANNER` | Call `options.AddPrefix("ZASTAVA")` before relying on `ZASTAVA_*` overrides. |
| Future CLI / BuildX plug-ins | `CLI`, `SCANNER` | Allows per-user overrides without breaking shared env files. |
This approach means operators can define a single env file (SCANNER_*) and only override the handful of settings that diverge for a specific component by introducing an additional prefix.
### 3.4 Configuration precedence
The builder resolves every suffix using the following precedence:
1. Environment variables using the configured prefixes (e.g., `ZASTAVA_SURFACE_FS_ENDPOINT`, then `SCANNER_SURFACE_FS_ENDPOINT`, then the bare `SURFACE_FS_ENDPOINT`).
2. Configuration values under the `Surface:*` section (for example `Surface:Fs:Endpoint`, `Surface:Cache:Root` in `appsettings.json` or Helm values).
3. Hard-coded defaults baked into `SurfaceEnvironmentBuilder` (temporary directory, `surface-cache` bucket, etc.).
`SurfaceEnvironmentOptions.RequireSurfaceEndpoint` controls whether a missing endpoint results in an exception (default: `true`). Other values fall back to the default listed in §3.1/3.2 and are further validated by the Surface.Validation pipeline.
## 4. API Surface
```csharp
public interface ISurfaceEnvironment
{
SurfaceEnvironmentSettings Settings { get; }
IReadOnlyDictionary<string, string> RawVariables { get; }
}
public sealed record SurfaceEnvironmentSettings(
Uri SurfaceFsEndpoint,
string SurfaceFsBucket,
string? SurfaceFsRegion,
DirectoryInfo CacheRoot,
int CacheQuotaMegabytes,
bool PrefetchEnabled,
IReadOnlyCollection<string> FeatureFlags,
SurfaceSecretsConfiguration Secrets,
string Tenant,
SurfaceTlsConfiguration Tls)
{
public DateTimeOffset CreatedAtUtc { get; init; }
}
public sealed record SurfaceSecretsConfiguration(
string Provider,
string Tenant,
string? Root,
string? Namespace,
string? FallbackProvider,
bool AllowInline);
public sealed record SurfaceTlsConfiguration(
string? CertificatePath,
string? PrivateKeyPath,
X509Certificate2Collection? ClientCertificates);
```
`ISurfaceEnvironment.RawVariables` captures the exact env/config keys that produced the snapshot so operators can export them in diagnostics bundles.
`SurfaceEnvironmentOptions` configures how the snapshot is built:
* `ComponentName` used in logs/validation output.
* `Prefixes` ordered list of env prefixes (see §3.3). Defaults to `["SCANNER"]`.
* `RequireSurfaceEndpoint` throw when no endpoint is provided (default `true`).
* `TenantResolver` delegate invoked when `SCANNER_SURFACE_TENANT` is absent.
* `KnownFeatureFlags` recognised feature switches; unexpected values raise warnings.
Example registration:
```csharp
builder.Services.AddSurfaceEnvironment(options =>
{
options.ComponentName = "Scanner.Worker";
options.AddPrefix("ZASTAVA"); // optional future override
options.KnownFeatureFlags.Add("validation");
options.TenantResolver = sp => sp.GetRequiredService<ITenantContext>().TenantId;
});
```
Consumers access `ISurfaceEnvironment.Settings` and pass the record into Surface.FS, Surface.Secrets, cache, and validation helpers. The interface memoises results so repeated access is cheap.
## 5. Validation
`SurfaceEnvironmentBuilder` only throws `SurfaceEnvironmentException` for malformed inputs (non-integer quota, invalid URI, missing required variable when `RequireSurfaceEndpoint = true`). The richer validation pipeline lives in `StellaOps.Scanner.Surface.Validation` and runs via `services.AddSurfaceValidation()`:
1. **SurfaceEndpointValidator** checks for a non-placeholder endpoint and bucket (`SURFACE_ENV_MISSING_ENDPOINT`, `SURFACE_FS_BUCKET_MISSING`).
2. **SurfaceCacheValidator** verifies the cache directory exists/is writable and that the quota is positive (`SURFACE_ENV_CACHE_DIR_UNWRITABLE`, `SURFACE_ENV_CACHE_QUOTA_INVALID`).
3. **SurfaceSecretsValidator** validates provider names, required namespace/root fields, and tenant presence (`SURFACE_SECRET_PROVIDER_UNKNOWN`, `SURFACE_SECRET_CONFIGURATION_MISSING`, `SURFACE_ENV_TENANT_MISSING`).
Validators emit `SurfaceValidationIssue` instances with codes defined in `SurfaceValidationIssueCodes`. `LoggingSurfaceValidationReporter` writes structured log entries (Info/Warning/Error) using the component name, issue code, and remediation hint. Hosts fail startup if any issue has `Error` severity; warnings allow startup but surface actionable hints.
## 6. Integration Guidance
- **Scanner Worker**: register `AddSurfaceEnvironment`, `AddSurfaceValidation`, `AddSurfaceFileCache`, and `AddSurfaceSecrets` before analyzer/services (see `src/Scanner/StellaOps.Scanner.Worker/Program.cs`). `SurfaceCacheOptionsConfigurator` already binds the cache root from `ISurfaceEnvironment`.
- **Scanner WebService**: identical wiring, plus `SurfacePointerService`/`ScannerSurfaceSecretConfigurator` reuse the resolved settings (`Program.cs` demonstrates the pattern).
- **Zastava Observer/Webhook**: will reuse the same helper once the service adds `AddSurfaceEnvironment(options => options.AddPrefix("ZASTAVA"))` so per-component overrides function without diverging defaults.
- **Scheduler / CLI / BuildX (future)**: treat `ISurfaceEnvironment` as read-only input; secret lookup, cache plumbing, and validation happen before any queue/enqueue work.
Readiness probes should invoke `ISurfaceValidatorRunner` (registered by `AddSurfaceValidation`) and fail the endpoint when any issue is returned. The Scanner Worker/WebService hosted services already run the validators on startup; other consumers should follow the same pattern.
### 6.1 Validation output
`LoggingSurfaceValidationReporter` produces log entries that include:
```
Surface validation issue for component Scanner.Worker: SURFACE_ENV_MISSING_ENDPOINT - Surface FS endpoint is missing or invalid. Hint: Set SCANNER_SURFACE_FS_ENDPOINT to the RustFS/S3 endpoint.
```
Treat `SurfaceValidationIssueCodes.*` with severity `Error` as hard blockers (readiness must fail). `Warning` entries flag configuration drift (for example, missing namespaces) but allow startup so staging/offline runs can proceed. The codes appear in both the structured log state and the reporter payload, making it easy to alert on them.
## 7. Security & Observability
- Surface.Env never logs raw values; only suffix names and issue codes appear in logs. `RawVariables` is intended for diagnostics bundles and should be treated as sensitive metadata.
- TLS certificates are loaded into memory and not re-serialised; only the configured paths are exposed to downstream services.
- To emit metrics, register a custom `ISurfaceValidationReporter` (e.g., wrapping Prometheus counters) in addition to the logging reporter.
## 8. Offline & Air-Gap Support
- Defaults assume no public network access; point `SCANNER_SURFACE_FS_ENDPOINT` at an internal RustFS/S3 mirror.
- Offline bundles must capture an env file (Ops track this under the Offline Kit tasks) so operators can seed `SCANNER_*` values before first boot.
- Keep `docs/modules/devops/runbooks/zastava-deployment.md` in sync so Zastava deployments reuse the same env contract.
## 9. Testing Strategy
- Unit tests for each resolver/validator.
- Integration tests for Worker & Observer verifying that missing configuration causes deterministic failures.
- Golden tests for configuration precedence (component overrides, defaults).
## 10. Open Questions / Future Work
- Dynamic refresh of environment (watch ConfigMap) is out of scope for v1.
- Evaluate adding support for environment discovery via `IConfiguration` only (no env vars) for Windows service deployments.
## 11. References
- Surface.FS Design (`docs/modules/scanner/design/surface-fs.md`)
- Surface.Secrets Design (`docs/modules/scanner/design/surface-secrets.md`)
- Surface.Validation Design (`docs/modules/scanner/design/surface-validation.md`)
- AirGap mode overview (`docs/airgap/airgap-mode.md`)