Files
git.stella-ops.org/docs/modules/scanner/design/surface-validation.md
StellaOps Bot 029002ad05 work
2025-11-23 23:40:10 +02:00

6.0 KiB

Surface.Validation Design (Epic: SURFACE-SHARING)

Status: v1.1 (2025-11-23) — aligned to Surface.Secrets schema (surface-secrets-schema.md) and Surface.Env release 0.1.0-alpha.20251123; covers tasks SURFACE-VAL-01..05, LANG-SURFACE-01..03, ENTRYTRACE-SURFACE-01..02, ZASTAVA-SURFACE-02, SCANNER-SECRETS-01..03.

Audience: Engineers integrating Surface Env/FS/Secrets, QA guild, Security guild.

1. Objectives

Surface.Validation provides a shared validator framework to ensure all surface consumers meet configuration and data preconditions before performing work. It prevents subtle runtime errors by failing fast with actionable diagnostics.

2. Core Interfaces

public interface ISurfaceValidator
{
    ValueTask<SurfaceValidationResult> ValidateAsync(SurfaceValidationContext context, CancellationToken ct = default);
}

public sealed record SurfaceValidationContext(
    IServiceProvider Services,
    string ComponentName,
    SurfaceEnvironmentSettings Environment,
    IReadOnlyDictionary<string, object?> Properties)
{
    public static SurfaceValidationContext Create(
        IServiceProvider services,
        string componentName,
        SurfaceEnvironmentSettings environment,
        IReadOnlyDictionary<string, object?>? properties = null);
}

public interface ISurfaceValidatorRunner
{
    ValueTask<SurfaceValidationResult> RunAllAsync(SurfaceValidationContext context, CancellationToken ct = default);
    ValueTask EnsureAsync(SurfaceValidationContext context, CancellationToken ct = default);
}

public sealed record SurfaceValidationIssue(
    string Code,
    string Message,
    SurfaceValidationSeverity Severity,
    string? Hint = null);

Properties carries optional context-specific metadata (e.g., jobId, imageDigest, cache paths) so validators can tailor diagnostics without pulling additional services. Validators register with DI (services.AddSurfaceValidation()). Hosts call ISurfaceValidatorRunner.RunAllAsync() during startup and before workload execution to capture misconfiguration early; EnsureAsync() rethrows when Surface:Validation:ThrowOnFailure=true.

3. Built-in Validators

Code Severity Description
SURFACE_ENV_MISSING_ENDPOINT Error Raised when SurfaceFsEndpoint absent.
SURFACE_ENV_CACHE_DIR_UNWRITABLE Error Cache root not writable or disk full.
SURFACE_SECRET_MISSING Error Secret provider cannot locate required secret type.
SURFACE_SECRET_STALE Warning Secret older than rotation window.
SURFACE_SECRET_FORMAT_INVALID Error Secret payload fails schema validation per surface-secrets-schema.md.
SURFACE_FS_ENDPOINT_REACHABILITY Error HEAD request to Surface.FS endpoint failed.
SURFACE_FS_BUCKET_MISMATCH Error Provided bucket does not exist / lacks permissions.
SURFACE_FEATURE_UNKNOWN Warning Feature flag not recognised.
SURFACE_TENANT_MISMATCH Error Tenant from environment differs from Authority token tenant.

Validation pipeline stops on the first error (severity Error) unless Surface:Validation:ContinueOnError=true is set (useful for diagnostics mode).

4. Extensibility

Consumers can register custom validators:

services.AddSurfaceValidation(builder =>
    builder.AddValidator<RegistryCredentialsValidator>()
           .AddValidator<RuntimeCacheWarmupValidator>());

Validators can access DI services (e.g., HttpClient, Authority token provider) through the context. To avoid long-running checks, recommended max validation time is 500ms per validator.

Secrets schema binding

  • Validators must load the secrets configuration into the JSON Schema defined in surface-secrets-schema.md; reject when provider/fallback ids are outside the allowed set or when file permissions differ from 0600.
  • When Surface:Secrets:Provider = file, ensure each required secret exists at <root>/<tenant>/<component>/<secretType>/<name>.json with base64 payload matching the secret type contract (see §2 in surface-secrets-schema.md).
  • Inline provider must be disabled in production (AllowInline=false); validation emits SURFACE_SECRET_FORMAT_INVALID if enabled without an explicit dev/test flag.

5. Reporting & Observability

  • Results exposed via ISurfaceValidationReporter (default logs structured JSON to Validation category).
  • Metrics: surface_validation_issues_total{code,severity}.
  • Optional debug endpoint /internal/surface/validation (Scanner WebService) returns last validation run.

6. Integration Guidelines

  • Scanner Worker/WebService: fail startup if any error-level issue occurs; log warnings but continue running.
  • Scanner EntryTrace: execute RunAllAsync for each scan job with properties {imageDigest, jobId, configPath, rootPath}. If the result contains errors, skip analysis and log the issue summary instead of failing the entire scan.
  • Zastava Webhook: treat validation errors as fatal (webhook should not enforce policies when surface preconditions fail). Display validation error summary in /readyz response to aid debugging.
  • Analysers: call SurfaceValidation.Ensure() before executing heavy work to catch misconfiguration during integration tests.

7. Testing Strategy

  • Unit tests for built-in validators using in-memory providers.
  • Integration tests in Scanner/Zastava verifying validators run during startup and produce expected outcomes.
  • Negative tests simulating missing secrets, unreachable endpoints, or mismatched tenants.

8. Error Handling & Remediation

  • Each issue includes a hint describing remediation steps (e.g., “Verify SCANNER_SURFACE_FS_ENDPOINT is reachable from worker nodes”).
  • DevOps runbooks should reference issue codes in troubleshooting sections.
  • surface_validation.json file stored alongside application logs summarises the last run for offline support.

9. References

  • docs/modules/scanner/design/surface-env.md
  • docs/modules/scanner/design/surface-fs.md
  • docs/modules/scanner/design/surface-secrets.md
  • docs/modules/devops/runbooks/zastava-deployment.md