Harden runtime HTTP transport lifecycles
This commit is contained in:
@@ -0,0 +1,137 @@
|
||||
# Sprint 20260405-011 - Transport Pooling And Attribution Hardening
|
||||
|
||||
## Topic & Scope
|
||||
- Standardize runtime transport client lifecycle and attribution so Stella Ops services stop producing anonymous or churn-heavy long-lived connections.
|
||||
- Extend the shared PostgreSQL infrastructure first, then patch the known PostgreSQL and Valkey runtime hotspots to use named, reusable clients.
|
||||
- Continue the hardening pass through the first HTTP lifecycle hotspots where service-owned runtime code still allocated raw `HttpClient` instances.
|
||||
- Add static guardrails and focused tests so raw runtime transport construction does not re-enter the codebase unnoticed.
|
||||
- Working directory: `src/__Libraries/`.
|
||||
- Expected evidence: shared infrastructure tests, targeted service/runtime validation, updated transport/database docs, and sprint-linked before/after findings.
|
||||
|
||||
## Dependencies & Concurrency
|
||||
- Depends on `docs/implplan/SPRINT_20260405_008_Integrations_consul_pg_router_runtime_tuning.md` for the PostgreSQL runtime logging baseline.
|
||||
- Depends on `docs/implplan/SPRINT_20260405_010_AdvisoryAI_pg_pooling_and_gitea_spike_followup.md` for the proven AdvisoryAI regression pattern and remediation baseline.
|
||||
- Cross-module edits allowed for `src/AdvisoryAI/**`, `src/Attestor/**`, `src/Authority/**`, `src/BinaryIndex/**`, `src/Concelier/**`, `src/Doctor/**`, `src/EvidenceLocker/**`, `src/Findings/**`, `src/Graph/**`, `src/Integrations/**`, `src/JobEngine/**`, `src/Notify/**`, `src/Platform/**`, `src/Policy/**`, `src/ReachGraph/**`, `src/ReleaseOrchestrator/**`, `src/Scanner/**`, `src/Signals/**`, `src/Timeline/**`, `src/Router/**`, `src/Plugin/**`, `docs/**`, and `devops/**` when they consume the shared transport conventions.
|
||||
|
||||
## Documentation Prerequisites
|
||||
- `docs/code-of-conduct/CODE_OF_CONDUCT.md`
|
||||
- `docs/code-of-conduct/TESTING_PRACTICES.md`
|
||||
- `docs/README.md`
|
||||
- `docs/07_HIGH_LEVEL_ARCHITECTURE.md`
|
||||
- `docs/modules/platform/architecture-overview.md`
|
||||
- `docs/db/RULES.md`
|
||||
- `src/__Libraries/AGENTS.md`
|
||||
- `src/__Libraries/StellaOps.Infrastructure.Postgres/AGENTS.md`
|
||||
- `src/__Tests/AGENTS.md`
|
||||
|
||||
## Delivery Tracker
|
||||
|
||||
### XPORT-STD-001 - Extend shared PostgreSQL transport policy
|
||||
Status: DONE
|
||||
Dependency: none
|
||||
Owners: Developer
|
||||
Task description:
|
||||
- Add stable application-name support and complete pooling policy propagation to the shared PostgreSQL options/base infrastructure so module-level data sources can be named and tuned without ad hoc code.
|
||||
- Update the shared library docs and tests so the behavior is explicit and regression-safe.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Shared PostgreSQL options expose stable runtime application-name configuration.
|
||||
- [x] Shared data-source construction applies application name plus the full pooling policy, including idle lifetime.
|
||||
- [x] Infrastructure.Postgres tests cover the new policy behavior.
|
||||
|
||||
### XPORT-RUNTIME-002 - Patch runtime PostgreSQL callers and service bootstraps
|
||||
Status: DONE
|
||||
Dependency: XPORT-STD-001
|
||||
Owners: Developer
|
||||
Task description:
|
||||
- Convert the currently known runtime hotspots and service bootstraps to named, reusable PostgreSQL data sources instead of anonymous or ad hoc construction.
|
||||
- Prioritize the services already identified in live runtime evidence: Findings, JobEngine, EvidenceLocker, AdvisoryAI/OpsMemory, ReachGraph, and Scanner reachability paths.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Touched runtime services stop constructing anonymous PostgreSQL data sources in their steady-state code paths.
|
||||
- [x] Hot-path repositories touched by this sprint use reusable data sources/providers instead of raw connection strings where practical.
|
||||
- [x] Compose/runtime-facing defaults or docs are updated when a touched service gains a new attribution/pooling option.
|
||||
|
||||
### XPORT-GUARD-003 - Add static guardrails for runtime transport construction
|
||||
Status: DONE
|
||||
Dependency: XPORT-STD-001
|
||||
Owners: Developer
|
||||
Task description:
|
||||
- Add a focused convention test that scans runtime code for forbidden raw transport construction patterns and documents the allowlisted exceptions (tests, migrations, CLI setup, one-shot diagnostics).
|
||||
- Cover PostgreSQL first, then include the agreed non-PostgreSQL transport patterns where the current implementation can enforce them deterministically.
|
||||
|
||||
Completion criteria:
|
||||
- [x] A deterministic test fails on forbidden runtime transport construction patterns outside the allowlist.
|
||||
- [x] The allowlist is explicit and narrow.
|
||||
- [x] The guardrail is documented in the relevant shared docs/sprint notes.
|
||||
|
||||
### XPORT-VALKEY-004 - Stamp runtime Valkey client identity and extend guardrails
|
||||
Status: DONE
|
||||
Dependency: XPORT-GUARD-003
|
||||
Owners: Developer
|
||||
Task description:
|
||||
- Stamp stable `ClientName` defaults across the runtime Valkey/Redis multiplexer construction paths that were still anonymous in service code or shared queue/cache transport helpers.
|
||||
- Extend the shared convention test so unnamed runtime `ConnectionMultiplexer.Connect(...)` / `ConnectAsync(...)` usage fails outside explicit CLI/tooling/test exceptions.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Touched runtime Valkey/Redis multiplexer paths stamp stable client identity before connecting.
|
||||
- [x] The shared convention suite fails on unnamed runtime Valkey multiplexer construction outside a narrow allowlist.
|
||||
- [x] Shared transport rules and touched module task boards reference the new Valkey attribution standard.
|
||||
|
||||
### XPORT-HTTP-005 - Remove raw runtime HttpClient allocation from first-wave host paths
|
||||
Status: DONE
|
||||
Dependency: XPORT-GUARD-003
|
||||
Owners: Developer
|
||||
Task description:
|
||||
- Patch the known host-owned HTTP lifecycle hotspots so they no longer allocate ad hoc `HttpClient` instances in steady-state runtime paths.
|
||||
- Prefer named `IHttpClientFactory` clients where the host owns DI, and use compatibility-safe shared fallbacks only where the current plugin/controller seam still cannot require factory-backed construction.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Platform identity-provider connection tests use a named factory-backed client with no raw fallback allocation.
|
||||
- [x] Attestor TrustRepo online/offline registrations resolve TUF HTTP via a named factory-backed client.
|
||||
- [x] Shared HTTP hotspot regression coverage and docs capture the first hardening wave without claiming repo-wide HTTP enforcement.
|
||||
|
||||
### XPORT-HTTP-006 - Extend HTTP lifecycle hardening through plugin seams and legacy connector wrappers
|
||||
Status: DONE
|
||||
Dependency: XPORT-HTTP-005
|
||||
Owners: Developer
|
||||
Task description:
|
||||
- Make the Integrations plugin loading seam DI-aware so built-in connector plugins can consume factory-backed runtime clients without reflection-only constructor limits.
|
||||
- Patch the next HTTP hotspot wave across Integrations feed/object mirror plugins, ReleaseOrchestrator legacy vault/registry connectors, and OCI helper fallbacks so runtime code no longer allocates per-call or ad hoc `HttpClient` instances along those paths.
|
||||
|
||||
Completion criteria:
|
||||
- [x] Integration plugin loading supports service-provider-backed activation for runtime plugins while preserving no-DI compatibility.
|
||||
- [x] Integrations built-in feed/object plugins use factory-backed or shared compatibility clients instead of raw per-call `HttpClient` construction.
|
||||
- [x] Legacy ReleaseOrchestrator token/auth helper paths and OCI fallback helpers move onto shared compatibility clients, and the shared hotspot convention test covers the touched files.
|
||||
|
||||
## Execution Log
|
||||
| Date (UTC) | Update | Owner |
|
||||
| --- | --- | --- |
|
||||
| 2026-04-05 | Sprint created to turn the AdvisoryAI pooling fix into a repo-wide transport hardening pass across shared PostgreSQL infrastructure, runtime callers, and static guardrails. | Developer |
|
||||
| 2026-04-05 | Added shared PostgreSQL application-name policy, patched the first runtime caller wave (JobEngine, EvidenceLocker, Platform, AdvisoryAI/OpsMemory, ReachGraph, Scanner, Router transport, Plugin registry, VexLens, Findings, ExportCenter, Replay), and added convention coverage for anonymous runtime data-source creation. | Developer |
|
||||
| 2026-04-05 | Validation: `dotnet test src/__Libraries/__Tests/StellaOps.Infrastructure.Postgres.Tests/StellaOps.Infrastructure.Postgres.Tests.csproj` (79/79 under Microsoft.Testing.Platform) plus targeted `dotnet build` runs for JobEngine.WebService, EvidenceLocker.Infrastructure, Scanner.Reachability, Platform.WebService, OpsMemory.WebService, ReachGraph.WebService, ExportCenter.Infrastructure, Replay.WebService, RiskEngine.Infrastructure, and ReleaseOrchestrator.PolicyGate all passed. | Developer |
|
||||
| 2026-04-05 | Patched the second PostgreSQL runtime wave (Attestor Watchlist/Persistence/Rekor checkpoint store, BinaryIndex.Validation, Concelier.ProofService.Postgres, Doctor.WebService report storage, and Graph saved views) to use named reusable data sources and extended the convention test to fail on raw runtime `NpgsqlConnection` outside an explicit allowlist. | Developer |
|
||||
| 2026-04-05 | Validation: targeted `dotnet build` runs for Attestor.Watchlist, Attestor.Persistence, Attestor.Core, BinaryIndex.Validation, Concelier.ProofService.Postgres, Doctor.WebService, and Graph.Api all passed; `dotnet test src/__Libraries/__Tests/StellaOps.Infrastructure.Postgres.Tests/StellaOps.Infrastructure.Postgres.Tests.csproj` passed `80/80`. | Developer |
|
||||
| 2026-04-05 | Patched the runtime Valkey wave across Signals, BinaryIndex, ReachGraph, Attestor, Platform, Authority, Policy, JobEngine Scheduler, Scanner queue/cache/webservice paths, Notify queue paths, Timeline indexer, Router Valkey transport/gateway rate limiting, and Concelier cache so steady-state multiplexer construction stamps stable `ClientName` values. | Developer |
|
||||
| 2026-04-05 | Validation: targeted `dotnet build` runs for Signals, BinaryIndex.WebService, ReachGraph.WebService, Attestor.Infrastructure, Platform.WebService, Authority, Policy.Engine, Scheduler.Queue, Scheduler.WebService, Scanner.Queue, Scanner.CallGraph, Scanner.WebService, Notify.Queue, TimelineIndexer.Infrastructure, Messaging.Transport.Valkey, Router.Gateway, and Concelier.Cache.Valkey all passed; `dotnet test src/__Libraries/__Tests/StellaOps.Infrastructure.Postgres.Tests/StellaOps.Infrastructure.Postgres.Tests.csproj` passed `81/81`. | Developer |
|
||||
| 2026-04-05 | Patched the first HTTP lifecycle wave across Platform identity-provider probing, Attestor TrustRepo online/offline TUF registration, shared Artifact HTTP fetch, Integrations Vault client wiring, and the S3-compatible integration plugin fallback so these host-owned paths no longer allocate ad hoc runtime `HttpClient` instances. | Developer |
|
||||
| 2026-04-05 | Validation: `dotnet build src/Integrations/StellaOps.Integrations.WebService/StellaOps.Integrations.WebService.csproj` and `dotnet build src/__Libraries/StellaOps.Artifact.Core/StellaOps.Artifact.Core.csproj` passed; `dotnet test src/Attestor/__Libraries/__Tests/StellaOps.Attestor.TrustRepo.Tests/StellaOps.Attestor.TrustRepo.Tests.csproj` passed `21/21`; `dotnet test src/Integrations/__Tests/StellaOps.Integrations.Plugin.Tests/StellaOps.Integrations.Plugin.Tests.csproj` passed `17/17`; `dotnet test src/__Libraries/__Tests/StellaOps.Infrastructure.Postgres.Tests/StellaOps.Infrastructure.Postgres.Tests.csproj` passed `82/82`. A full `dotnet test src/Platform/__Tests/StellaOps.Platform.WebService.Tests/StellaOps.Platform.WebService.Tests.csproj` run completed with two unrelated existing failures in `SeedEndpointsTests.SeedDemo_WhenAuthorizationFails_ReturnsForbidden` and `QuotaEndpointsTests.Quotas_ReturnDeterministicOrder`; the new identity-provider HTTP wiring compiled and ran inside that assembly pass. | Developer |
|
||||
| 2026-04-05 | Patched the second HTTP lifecycle wave by making the shared plugin loader service-provider aware, moving Integrations feed/object built-ins onto named/shared compatibility HTTP clients, routing ReleaseOrchestrator legacy vault/registry connectors through shared compatibility wrappers, and replacing raw OCI fallback client allocation in Verdict and TrustVerdict helpers. | Developer |
|
||||
| 2026-04-05 | Validation: `dotnet build src/Integrations/StellaOps.Integrations.WebService/StellaOps.Integrations.WebService.csproj`, `dotnet build src/ReleaseOrchestrator/__Libraries/StellaOps.ReleaseOrchestrator.IntegrationHub/StellaOps.ReleaseOrchestrator.IntegrationHub.csproj`, and `dotnet build src/__Libraries/StellaOps.Verdict/StellaOps.Verdict.csproj` passed; `dotnet test src/Integrations/__Tests/StellaOps.Integrations.Tests/StellaOps.Integrations.Tests.csproj` passed with the new DI-aware plugin loader coverage; `dotnet test src/Attestor/__Libraries/StellaOps.Attestor.TrustVerdict.Tests/StellaOps.Attestor.TrustVerdict.Tests.csproj` passed; `dotnet test src/__Libraries/__Tests/StellaOps.Infrastructure.Postgres.Tests/StellaOps.Infrastructure.Postgres.Tests.csproj` passed with the expanded HTTP hotspot allowlist. | Developer |
|
||||
|
||||
## Decisions & Risks
|
||||
- The first implementation wave standardizes PostgreSQL fully and applies the same lifecycle/attribution rule to other transports only where the existing runtime code already exposes a shared construction seam.
|
||||
- Tests, migrations, CLI setup, and one-shot admin checks are not treated as runtime transport violations unless they share code with steady-state service paths.
|
||||
- Cross-module service patches will be kept minimal and tied back to the shared standard rather than introducing per-service bespoke option models where the shared library can carry the behavior.
|
||||
- The static guardrail now enforces anonymous `NpgsqlDataSource.Create(...)`, unnamed `NpgsqlDataSourceBuilder`, and raw runtime `NpgsqlConnection` usage outside an explicit allowlist.
|
||||
- The Valkey convention guardrail now also fails unnamed runtime `ConnectionMultiplexer.Connect(...)` / `ConnectAsync(...)` call sites outside explicit CLI/tooling/test exceptions.
|
||||
- The first shared HTTP guardrail is intentionally narrow: it covers the known host-owned hotspot files patched in this sprint, while broader repo-wide HTTP enforcement remains a follow-up because several legacy connectors and tools still create transport-specific temporary clients.
|
||||
- Integrations now activates connector plugins through DI when a service provider is available, which lets built-in runtime plugins consume named factory-backed clients without breaking reflection-only callers that still rely on default construction.
|
||||
- ReleaseOrchestrator legacy connectors still do not use `IHttpClientFactory`; this sprint moves them onto a shared-handler compatibility wrapper so token/auth flows stop allocating temporary clients while preserving the current plugin contract.
|
||||
- The remaining explicit raw-connection allowlist is intentionally narrow: CLI/setup, migrations, diagnostics, `PlatformMigrationAdminService`, and `Workflow`'s PostgreSQL store. `Workflow` remains allowlisted because `src/Workflow/AGENTS.md` is missing, which blocks implementer-side runtime edits under the repo contract.
|
||||
- Shared Valkey factories that do not receive a service-specific name now apply a module-level fallback `ClientName`; this restores baseline attribution, but Router transport callers may still want a future option for per-service Valkey identity.
|
||||
- Shared transport rules are documented in `docs/technical/runtime-transport-client-rules.md`.
|
||||
- HTTP compatibility fallbacks now live behind module-specific wrappers (`Integrations` shared defaults, `ReleaseOrchestrator` shared-handler connector clients, OCI helper shared clients) so hotspot files no longer construct raw clients directly; broader HTTP sweeps should continue to replace the remaining wrappers with true host-managed factories where possible.
|
||||
|
||||
## Next Checkpoints
|
||||
- Start the next transport hardening wave with the blocked `Workflow` PostgreSQL store once the module adds `AGENTS.md`, then continue the remaining broader HTTP/SCM/Vault-style lifecycle sweep (ReleaseOrchestrator SCM/cloud connectors, any remaining tool-specific temporary clients, and factory adoption for the compatibility wrappers added here) with the same guardrail approach.
|
||||
26
docs/technical/runtime-transport-client-rules.md
Normal file
26
docs/technical/runtime-transport-client-rules.md
Normal file
@@ -0,0 +1,26 @@
|
||||
# Runtime Transport Client Rules
|
||||
|
||||
This document defines the minimum lifecycle and attribution rules for long-lived runtime transport clients in Stella Ops services.
|
||||
|
||||
## PostgreSQL
|
||||
- Steady-state runtime code must use named reusable `NpgsqlDataSource` instances.
|
||||
- Runtime connection strings must carry stable `ApplicationName` values.
|
||||
- Raw `new NpgsqlConnection(...)` is reserved for explicit CLI/setup, migration, or diagnostic exceptions.
|
||||
|
||||
## Valkey / Redis
|
||||
- Steady-state runtime `ConnectionMultiplexer` construction must stamp a stable `ClientName`.
|
||||
- Runtime code should build `ConfigurationOptions`, apply client identity, and then connect.
|
||||
- Shared factories may provide a module-level default `ClientName` when the caller does not supply one.
|
||||
- CLI/setup tooling, smoke tools, and test fixtures are allowed exceptions when they are explicitly allowlisted in convention tests.
|
||||
|
||||
## HTTP
|
||||
- Runtime code should use `IHttpClientFactory`, typed clients, or module-specific wrappers instead of ad hoc `new HttpClient()`.
|
||||
- When DI-backed wiring is not available yet, compatibility fallbacks must still avoid per-request or per-call `new HttpClient()` churn.
|
||||
- Plugin loaders that activate runtime components should use service-provider-backed construction when available so named clients and other shared transports can flow into plugins.
|
||||
- Existing analyzer-based guardrails remain in place for specialized modules, and the shared convention suite now covers the scoped host-owned HTTP hotspot waves across Integrations, ReleaseOrchestrator connector helpers, and OCI fallback publishers.
|
||||
|
||||
## Static Enforcement
|
||||
- `src/__Libraries/__Tests/StellaOps.Infrastructure.Postgres.Tests/RuntimePostgresConstructionConventionTests.cs` enforces the shared PostgreSQL and Valkey runtime construction rules plus the scoped HTTP hotspot regression checks.
|
||||
|
||||
## Operational Goal
|
||||
- Every long-lived runtime transport should be attributable in production diagnostics without relying on IP-only correlation.
|
||||
Reference in New Issue
Block a user