Implement ledger metrics for observability and add tests for Ruby packages endpoints
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled
- Added `LedgerMetrics` class to record write latency and total events for ledger operations. - Created comprehensive tests for Ruby packages endpoints, covering scenarios for missing inventory, successful retrieval, and identifier handling. - Introduced `TestSurfaceSecretsScope` for managing environment variables during tests. - Developed `ProvenanceMongoExtensions` for attaching DSSE provenance and trust information to event documents. - Implemented `EventProvenanceWriter` and `EventWriter` classes for managing event provenance in MongoDB. - Established MongoDB indexes for efficient querying of events based on provenance and trust. - Added models and JSON parsing logic for DSSE provenance and trust information.
This commit is contained in:
@@ -40,35 +40,49 @@ Surface.Env exposes `ISurfaceEnvironment` which returns an immutable `SurfaceEnv
|
||||
|
||||
| Variable | Description | Default | Notes |
|
||||
|----------|-------------|---------|-------|
|
||||
| `SCANNER_SURFACE_FS_ENDPOINT` | Base URI for Surface.FS service (RustFS, S3-compatible). | _required_ | e.g. `https://surface-cache.svc.cluster.local`. Zastava uses `ZASTAVA_SURFACE_FS_ENDPOINT`; when absent, falls back to scanner value. |
|
||||
| `SCANNER_SURFACE_FS_BUCKET` | Bucket/container name used for manifests and artefacts. | `surface-cache` | Must be unique per tenant. |
|
||||
| `SCANNER_SURFACE_FS_REGION` | Optional region (S3-style). | `null` | Required for AWS S3. |
|
||||
| `SCANNER_SURFACE_CACHE_ROOT` | Local filesystem directory for warm caches. | `/var/lib/stellaops/surface` | Should reside on fast SSD. |
|
||||
| `SCANNER_SURFACE_CACHE_QUOTA_MB` | Soft limit for local cache usage. | `4096` | Enforced by Surface.FS eviction policy. |
|
||||
| `SCANNER_SURFACE_TLS_CERT_PATH` | Path to PEM bundle for mutual TLS with Surface.FS. | `null` | If provided, library loads cert/key pair. |
|
||||
| `SCANNER_SURFACE_TENANT` | Tenant identifier used for cache namespaces. | derived from Authority token | Can be overridden for multi-tenant workers. |
|
||||
| `SCANNER_SURFACE_PREFETCH_ENABLED` | Toggle surface prefetch threads. | `false` | If `true`, Worker prefetches manifests before analyzer stage. |
|
||||
| `SCANNER_SURFACE_FEATURES` | Comma-separated feature switches. | `""` | e.g. `validation,prewarm,runtime-diff`. |
|
||||
| `SCANNER_SURFACE_FS_ENDPOINT` | Base URI for Surface.FS / RustFS / S3-compatible store. | _required_ | Throws `SurfaceEnvironmentException` when `RequireSurfaceEndpoint = true`. When disabled (tests), builder falls back to `https://surface.invalid` so validation can fail fast. Also binds `Surface:Fs:Endpoint` from `IConfiguration`. |
|
||||
| `SCANNER_SURFACE_FS_BUCKET` | Bucket/container used for manifests and artefacts. | `surface-cache` | Must be unique per tenant; validators enforce non-empty value. |
|
||||
| `SCANNER_SURFACE_FS_REGION` | Optional region for S3-compatible stores. | `null` | Needed only when the backing store requires it (AWS/GCS). |
|
||||
| `SCANNER_SURFACE_CACHE_ROOT` | Local directory for warm caches. | `<temp>/stellaops/surface` | Directory is created if missing. Override to `/var/lib/stellaops/surface` (or another fast SSD) in production. |
|
||||
| `SCANNER_SURFACE_CACHE_QUOTA_MB` | Soft limit for on-disk cache usage. | `4096` | Enforced range 64–262144 MB; validation emits `SURFACE_ENV_CACHE_QUOTA_INVALID` outside the range. |
|
||||
| `SCANNER_SURFACE_PREFETCH_ENABLED` | Enables manifest prefetch threads. | `false` | Workers honour this before analyzer execution. |
|
||||
| `SCANNER_SURFACE_TENANT` | Tenant namespace used by cache + secret resolvers. | `TenantResolver(...)` or `"default"` | Default resolver may pull from Authority claims; you can override via env for multi-tenant pools. |
|
||||
| `SCANNER_SURFACE_FEATURES` | Comma-separated feature switches. | `""` | Compared against `SurfaceEnvironmentOptions.KnownFeatureFlags`; unknown flags raise warnings. |
|
||||
| `SCANNER_SURFACE_TLS_CERT_PATH` | Path to PEM/PKCS#12 file for client auth. | `null` | When present, `SurfaceEnvironmentBuilder` loads the certificate into `SurfaceTlsConfiguration`. |
|
||||
| `SCANNER_SURFACE_TLS_KEY_PATH` | Optional private-key path when cert/key are stored separately. | `null` | Stored in `SurfaceTlsConfiguration` for hosts that need to hydrate the key themselves. |
|
||||
|
||||
### 3.2 Secrets provider keys
|
||||
|
||||
| Variable | Description | Notes |
|
||||
|----------|-------------|-------|
|
||||
| `SCANNER_SURFACE_SECRETS_PROVIDER` | Provider ID (`kubernetes`, `file`, `inline`). | Controls Surface.Secrets back-end. |
|
||||
| `SCANNER_SURFACE_SECRETS_ROOT` | Path or secret namespace. | Example: `/etc/stellaops/secrets` for file provider. |
|
||||
| `SCANNER_SURFACE_SECRETS_TENANT` | Tenant override for secret lookup. | Defaults to `SCANNER_SURFACE_TENANT`. |
|
||||
| `SCANNER_SURFACE_SECRETS_PROVIDER` | Provider ID (`kubernetes`, `file`, `inline`, future back-ends). | Defaults to `kubernetes`; validators reject unknown values via `SURFACE_SECRET_PROVIDER_UNKNOWN`. |
|
||||
| `SCANNER_SURFACE_SECRETS_ROOT` | Path or base namespace for the provider. | Required for the `file` provider (e.g., `/etc/stellaops/secrets`). |
|
||||
| `SCANNER_SURFACE_SECRETS_NAMESPACE` | Kubernetes namespace used by the secrets provider. | Mandatory when `provider = kubernetes`. |
|
||||
| `SCANNER_SURFACE_SECRETS_FALLBACK_PROVIDER` | Optional secondary provider ID. | Enables tiered lookups (e.g., `kubernetes` → `inline`) without changing code. |
|
||||
| `SCANNER_SURFACE_SECRETS_ALLOW_INLINE` | Allows returning inline secrets (useful for tests). | Defaults to `false`; Production deployments should keep this disabled. |
|
||||
| `SCANNER_SURFACE_SECRETS_TENANT` | Tenant override for secret lookups. | Defaults to `SCANNER_SURFACE_TENANT` or the tenant resolver result. |
|
||||
|
||||
### 3.3 Zastava-specific keys
|
||||
### 3.3 Component-specific prefixes
|
||||
|
||||
Zastava containers read the same primary variables but may override names under the `ZASTAVA_` prefix (e.g., `ZASTAVA_SURFACE_CACHE_ROOT`, `ZASTAVA_SURFACE_FEATURES`). Surface.Env automatically checks component-specific prefixes before falling back to the scanner defaults.
|
||||
`SurfaceEnvironmentOptions.Prefixes` controls the order in which suffixes are probed. Every suffix listed above is combined with each prefix (e.g., `SCANNER_SURFACE_FS_ENDPOINT`, `ZASTAVA_SURFACE_FS_ENDPOINT`) and finally the bare suffix (`SURFACE_FS_ENDPOINT`). Configure prefixes per host so local overrides win but global scanner defaults remain available:
|
||||
|
||||
| Component | Suggested prefixes (first match wins) | Notes |
|
||||
|-----------|---------------------------------------|-------|
|
||||
| Scanner.Worker / WebService | `SCANNER` | Default – already added by `AddSurfaceEnvironment`. |
|
||||
| Zastava Observer/Webhook (planned) | `ZASTAVA`, `SCANNER` | Call `options.AddPrefix("ZASTAVA")` before relying on `ZASTAVA_*` overrides. |
|
||||
| Future CLI / BuildX plug-ins | `CLI`, `SCANNER` | Allows per-user overrides without breaking shared env files. |
|
||||
|
||||
This approach means operators can define a single env file (SCANNER_*) and only override the handful of settings that diverge for a specific component by introducing an additional prefix.
|
||||
|
||||
### 3.4 Configuration precedence
|
||||
|
||||
1. Explicit overrides passed to `SurfaceEnvBuilder` (e.g., from appsettings).
|
||||
2. Component-specific env (e.g., `ZASTAVA_SURFACE_FS_ENDPOINT`).
|
||||
3. Scanner global env (e.g., `SCANNER_SURFACE_FS_ENDPOINT`).
|
||||
4. `SurfaceEnvDefaults.json` (shipped with library for sensible defaults).
|
||||
5. Emergency fallback values defined in code (only for development scenarios).
|
||||
The builder resolves every suffix using the following precedence:
|
||||
|
||||
1. Environment variables using the configured prefixes (e.g., `ZASTAVA_SURFACE_FS_ENDPOINT`, then `SCANNER_SURFACE_FS_ENDPOINT`, then the bare `SURFACE_FS_ENDPOINT`).
|
||||
2. Configuration values under the `Surface:*` section (for example `Surface:Fs:Endpoint`, `Surface:Cache:Root` in `appsettings.json` or Helm values).
|
||||
3. Hard-coded defaults baked into `SurfaceEnvironmentBuilder` (temporary directory, `surface-cache` bucket, etc.).
|
||||
|
||||
`SurfaceEnvironmentOptions.RequireSurfaceEndpoint` controls whether a missing endpoint results in an exception (default: `true`). Other values fall back to the default listed in § 3.1/3.2 and are further validated by the Surface.Validation pipeline.
|
||||
|
||||
## 4. API Surface
|
||||
|
||||
@@ -79,65 +93,99 @@ public interface ISurfaceEnvironment
|
||||
IReadOnlyDictionary<string, string> RawVariables { get; }
|
||||
}
|
||||
|
||||
public sealed record SurfaceEnvironmentSettings
|
||||
(
|
||||
public sealed record SurfaceEnvironmentSettings(
|
||||
Uri SurfaceFsEndpoint,
|
||||
string SurfaceFsBucket,
|
||||
string? SurfaceFsRegion,
|
||||
DirectoryInfo CacheRoot,
|
||||
int CacheQuotaMegabytes,
|
||||
X509Certificate2Collection? ClientCertificates,
|
||||
string Tenant,
|
||||
bool PrefetchEnabled,
|
||||
IReadOnlyCollection<string> FeatureFlags,
|
||||
SecretProviderConfiguration Secrets,
|
||||
IDictionary<string,string> ComponentOverrides
|
||||
);
|
||||
SurfaceSecretsConfiguration Secrets,
|
||||
string Tenant,
|
||||
SurfaceTlsConfiguration Tls)
|
||||
{
|
||||
public DateTimeOffset CreatedAtUtc { get; init; }
|
||||
}
|
||||
|
||||
public sealed record SurfaceSecretsConfiguration(
|
||||
string Provider,
|
||||
string Tenant,
|
||||
string? Root,
|
||||
string? Namespace,
|
||||
string? FallbackProvider,
|
||||
bool AllowInline);
|
||||
|
||||
public sealed record SurfaceTlsConfiguration(
|
||||
string? CertificatePath,
|
||||
string? PrivateKeyPath,
|
||||
X509Certificate2Collection? ClientCertificates);
|
||||
```
|
||||
|
||||
Consumers access `ISurfaceEnvironment.Settings` and pass the record into Surface.FS / Surface.Secrets factories. The interface memoises results so repeated access is cheap.
|
||||
`ISurfaceEnvironment.RawVariables` captures the exact env/config keys that produced the snapshot so operators can export them in diagnostics bundles.
|
||||
|
||||
`SurfaceEnvironmentOptions` configures how the snapshot is built:
|
||||
|
||||
* `ComponentName` – used in logs/validation output.
|
||||
* `Prefixes` – ordered list of env prefixes (see § 3.3). Defaults to `["SCANNER"]`.
|
||||
* `RequireSurfaceEndpoint` – throw when no endpoint is provided (default `true`).
|
||||
* `TenantResolver` – delegate invoked when `SCANNER_SURFACE_TENANT` is absent.
|
||||
* `KnownFeatureFlags` – recognised feature switches; unexpected values raise warnings.
|
||||
|
||||
Example registration:
|
||||
|
||||
```csharp
|
||||
builder.Services.AddSurfaceEnvironment(options =>
|
||||
{
|
||||
options.ComponentName = "Scanner.Worker";
|
||||
options.AddPrefix("ZASTAVA"); // optional future override
|
||||
options.KnownFeatureFlags.Add("validation");
|
||||
options.TenantResolver = sp => sp.GetRequiredService<ITenantContext>().TenantId;
|
||||
});
|
||||
```
|
||||
|
||||
Consumers access `ISurfaceEnvironment.Settings` and pass the record into Surface.FS, Surface.Secrets, cache, and validation helpers. The interface memoises results so repeated access is cheap.
|
||||
|
||||
## 5. Validation
|
||||
|
||||
Surface.Env invokes the following validators (implemented in Surface.Validation):
|
||||
`SurfaceEnvironmentBuilder` only throws `SurfaceEnvironmentException` for malformed inputs (non-integer quota, invalid URI, missing required variable when `RequireSurfaceEndpoint = true`). The richer validation pipeline lives in `StellaOps.Scanner.Surface.Validation` and runs via `services.AddSurfaceValidation()`:
|
||||
|
||||
1. **EndpointValidator** – ensures endpoint URI is absolute HTTPS and not localhost in production.
|
||||
2. **CacheQuotaValidator** – verifies quota > 0 and below host max.
|
||||
3. **FilesystemValidator** – checks cache root exists/writable; attempts to create directory if missing.
|
||||
4. **SecretsProviderValidator** – ensures provider-specific settings (e.g., Kubernetes namespace) are present.
|
||||
5. **FeatureFlagValidator** – warns on unknown feature flag tokens.
|
||||
1. **SurfaceEndpointValidator** – checks for a non-placeholder endpoint and bucket (`SURFACE_ENV_MISSING_ENDPOINT`, `SURFACE_FS_BUCKET_MISSING`).
|
||||
2. **SurfaceCacheValidator** – verifies the cache directory exists/is writable and that the quota is positive (`SURFACE_ENV_CACHE_DIR_UNWRITABLE`, `SURFACE_ENV_CACHE_QUOTA_INVALID`).
|
||||
3. **SurfaceSecretsValidator** – validates provider names, required namespace/root fields, and tenant presence (`SURFACE_SECRET_PROVIDER_UNKNOWN`, `SURFACE_SECRET_CONFIGURATION_MISSING`, `SURFACE_ENV_TENANT_MISSING`).
|
||||
|
||||
Failures throw `SurfaceEnvironmentException` with error codes (`SURFACE_ENV_MISSING_ENDPOINT`, `SURFACE_ENV_CACHE_DIR_UNWRITABLE`, etc.). Hosts log the error and fail fast during startup.
|
||||
Validators emit `SurfaceValidationIssue` instances with codes defined in `SurfaceValidationIssueCodes`. `LoggingSurfaceValidationReporter` writes structured log entries (Info/Warning/Error) using the component name, issue code, and remediation hint. Hosts fail startup if any issue has `Error` severity; warnings allow startup but surface actionable hints.
|
||||
|
||||
## 6. Integration Guidance
|
||||
|
||||
- **Scanner Worker**: call `services.AddSurfaceEnvironment()` in `Program.cs` before registering analyzers. Pass `hostContext.Configuration.GetSection("Surface")` for overrides.
|
||||
- **Scanner WebService**: build environment during startup using `AddSurfaceEnvironment`, `AddSurfaceValidation`, `AddSurfaceFileCache`, and `AddSurfaceSecrets`; readiness checks execute the validator runner and scan/report APIs emit Surface CAS pointers derived from the resolved configuration.
|
||||
- **Zastava Observer/Webhook**: use the same builder; ensure Helm charts set `ZASTAVA_` variables.
|
||||
- **Scheduler Planner (future)**: treat Surface.Env as read-only input; do not mutate settings.
|
||||
- `Scanner.Worker` and `Scanner.WebService` automatically bind the `SurfaceCacheOptions.RootDirectory` to `SurfaceEnvironment.Settings.CacheRoot` (2025-11-05); both hosts emit structured warnings (`surface.env.misconfiguration`) when the helper detects missing cache roots, endpoints, or secrets provider settings (2025-11-06).
|
||||
- **Scanner Worker**: register `AddSurfaceEnvironment`, `AddSurfaceValidation`, `AddSurfaceFileCache`, and `AddSurfaceSecrets` before analyzer/services (see `src/Scanner/StellaOps.Scanner.Worker/Program.cs`). `SurfaceCacheOptionsConfigurator` already binds the cache root from `ISurfaceEnvironment`.
|
||||
- **Scanner WebService**: identical wiring, plus `SurfacePointerService`/`ScannerSurfaceSecretConfigurator` reuse the resolved settings (`Program.cs` demonstrates the pattern).
|
||||
- **Zastava Observer/Webhook**: will reuse the same helper once the service adds `AddSurfaceEnvironment(options => options.AddPrefix("ZASTAVA"))` so per-component overrides function without diverging defaults.
|
||||
- **Scheduler / CLI / BuildX (future)**: treat `ISurfaceEnvironment` as read-only input; secret lookup, cache plumbing, and validation happen before any queue/enqueue work.
|
||||
|
||||
### 6.1 Misconfiguration warnings
|
||||
Readiness probes should invoke `ISurfaceValidatorRunner` (registered by `AddSurfaceValidation`) and fail the endpoint when any issue is returned. The Scanner Worker/WebService hosted services already run the validators on startup; other consumers should follow the same pattern.
|
||||
|
||||
Surface.Env surfaces actionable warnings that appear in structured logs and readiness responses:
|
||||
### 6.1 Validation output
|
||||
|
||||
- `surface.env.cache_root_missing` – emitted when the resolved cache directory does not exist or is not writable. The host attempts to create the directory once; subsequent failures block startup.
|
||||
- `surface.env.endpoint_unreachable` – emitted when `SurfaceFsEndpoint` is missing or not an absolute HTTPS URI.
|
||||
- `surface.env.secrets_provider_invalid` – emitted when the configured secrets provider lacks mandatory fields (e.g., `SCANNER_SURFACE_SECRETS_ROOT` for the `file` provider).
|
||||
`LoggingSurfaceValidationReporter` produces log entries that include:
|
||||
|
||||
Each warning includes remediation text and a reference to this design document; operations runbooks should treat these warnings as blockers in production and as validation hints in staging.
|
||||
```
|
||||
Surface validation issue for component Scanner.Worker: SURFACE_ENV_MISSING_ENDPOINT - Surface FS endpoint is missing or invalid. Hint: Set SCANNER_SURFACE_FS_ENDPOINT to the RustFS/S3 endpoint.
|
||||
```
|
||||
|
||||
Treat `SurfaceValidationIssueCodes.*` with severity `Error` as hard blockers (readiness must fail). `Warning` entries flag configuration drift (for example, missing namespaces) but allow startup so staging/offline runs can proceed. The codes appear in both the structured log state and the reporter payload, making it easy to alert on them.
|
||||
|
||||
## 7. Security & Observability
|
||||
|
||||
- Never log raw secrets; Surface.Env redacts values by default.
|
||||
- Emit metric `surface_env_validation_total{status}` to observe validation outcomes.
|
||||
- Provide `/metrics` gauge for cache quota/residual via Surface.FS integration.
|
||||
- Surface.Env never logs raw values; only suffix names and issue codes appear in logs. `RawVariables` is intended for diagnostics bundles and should be treated as sensitive metadata.
|
||||
- TLS certificates are loaded into memory and not re-serialised; only the configured paths are exposed to downstream services.
|
||||
- To emit metrics, register a custom `ISurfaceValidationReporter` (e.g., wrapping Prometheus counters) in addition to the logging reporter.
|
||||
|
||||
## 8. Offline & Air-Gap Support
|
||||
|
||||
- Defaults assume no public network access; endpoints should point to internal RustFS or S3-compatible system.
|
||||
- Offline kit templates supply env files under `offline/scanner/surface-env.env`.
|
||||
- Document steps in `docs/modules/devops/runbooks/zastava-deployment.md` and `offline-kit` tasks for synchronising env values.
|
||||
- Defaults assume no public network access; point `SCANNER_SURFACE_FS_ENDPOINT` at an internal RustFS/S3 mirror.
|
||||
- Offline bundles must capture an env file (Ops track this under the Offline Kit tasks) so operators can seed `SCANNER_*` values before first boot.
|
||||
- Keep `docs/modules/devops/runbooks/zastava-deployment.md` in sync so Zastava deployments reuse the same env contract.
|
||||
|
||||
## 9. Testing Strategy
|
||||
|
||||
|
||||
Reference in New Issue
Block a user