## Pack 15 — Operations: **Data Integrity** (Nightly Ops Report + Data Freshness unified; bubbles into Dashboard / Releases / Approvals) This pack creates a single **Operations → Data Integrity** area that answers: “**Can we trust today’s SBOM/CVE/reachability data to approve and promote? If not, what’s broken, where, and what decisions are impacted?**” It **does not duplicate** existing specialized pages (Scheduler/Orchestrator/Integrations/Feeds). It **summarizes + links** to them. --- # 15.1 Operations menu graph (Mermaid) — Data Integrity added as a first‑class Ops area ```mermaid flowchart TD OPS[Operations] --> OPS_DI[Data Integrity] OPS --> OPS_PH[Platform Health] OPS --> OPS_ORCH[Orchestrator] OPS --> OPS_SCHED[Scheduler] OPS --> OPS_DLQ[Dead Letter] OPS --> OPS_QUOTA[Quotas] OPS --> OPS_EXPORT[Export] OPS_DI --> DI_OV[Overview] OPS_DI --> DI_NIGHT[Nightly Ops Report] OPS_DI --> DI_FEEDS[Feeds Freshness] OPS_DI --> DI_SCAN[Scan Pipeline Health] OPS_DI --> DI_REACH[Reachability Ingest Health] OPS_DI --> DI_INTEG[Integration Connectivity] OPS_DI --> DI_DLQ[DLQ & Replays] OPS_DI --> DI_SLO[Data Quality SLOs] ``` **Design intent:** “Data Integrity” is the operator console for **freshness + pipeline status** that directly affects approvals/promotions. --- # 15.2 Bubble‑up graph (Mermaid) — how Data Integrity signals surface elsewhere (no duplication) ```mermaid flowchart LR DI[Ops: Data Integrity\n(single source of truth for data health)] --> DASH[Dashboard\nNightly Ops Signals card] DI --> REL[Releases List\nData Health column + banner] DI --> APR[Approvals\nOps/Data tab + warnings] DI --> SEC[Security Overview\nFeed freshness + scan freshness badges] DI --> ENV[Env Detail\nSBOM freshness + runtime coverage] DI --> INT[Integrations Hub\nconnector config & tests] DI --> FEED[Feeds & AirGap Ops\nmirrors/locks/airgap artifacts] DI --> SCHED[Scheduler Runs] DI --> ORCH[Orchestrator Jobs] DI --> DLQ[Dead Letter] DI --> PH[Platform Health] ``` --- # 15.3 Screen — Data Integrity Overview ### Previously (where it lived) * There was **no single overview**. * Equivalent fragments existed in: * **Nightly Ops Report** (your new screen request), * **Operations → Feeds** (freshness), * **Settings → Integrations** (connectivity), * **Settings → System → Background Jobs** (job failures), * **Operations → Dead Letter** (queue stuck), * plus scattered banners on approvals. ### Why changed like this You need **one** authoritative place to see: * **SBOM scan / rescan status** * **CVE feed sync freshness** * **Integration connectivity** * **Reachability ingest health (build / image / runtime)** * **Which approvals/releases are currently “unsafe to approve” because data is stale** ### Screen graph (Mermaid) ```mermaid flowchart TD A[Data Integrity Overview] --> B[Nightly Ops Report] A --> C[Feeds Freshness] A --> D[Scan Pipeline Health] A --> E[Reachability Ingest Health] A --> F[Integration Connectivity] A --> G[DLQ & Replays] A --> H[Platform Health] A --> I[Impacted Decisions\n(approvals/releases)] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ OVERVIEW │ │ Legacy: N/A (new). Previously: Ops Feeds + Settings System Jobs + Integrations + DLQ scattered │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Scope: Region ▾ (All) Environment Type ▾ (All) Window ▾ (24h) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ DATA TRUST SCORE (for approvals/promotions) │ │ Feeds Freshness: WARN (NVD stale 3h) SBOM Pipeline: FAIL (rescan job failing) │ │ Reachability Ingest: WARN (runtime coverage 35%) Integrations: DEGRADED (Jenkins) │ │ DLQ: WARN (reachability events queued: 1,230) │ │ Links: [Nightly Ops Report] [Feeds Freshness] [Integrations] [DLQ] │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ IMPACTED DECISIONS │ │ Approvals blocked due to data issues: 2 │ │ - Platform Release 1.3.0-rc1 → EU-West/eu-prod (SBOM incomplete + NVD stale) [Open] │ │ Promotions running with WARN confidence: 1 [Open Releases filtered] │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ TOP FAILURES (what to fix first) │ │ 1) Nightly SBOM rescan FAILED (registry auth timeout) → stale SBOM on 12 component versions │ │ 2) NVD feed stale 3h → CVE freshness gate WARN/FAIL depending on baseline │ │ 3) Runtime reachability ingest lagging (agent-apac-01 degraded) → runtime coverage 35% │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.4 Screen — Nightly Ops Report (Jobs + causes + impact) ### Previously (where it lived) * You asked for “some report about nightly jobs status” (new requirement). * Related fragments existed in: * **Settings → System → Background Jobs** * **Operations → Scheduler** (runs) * **Operations → Orchestrator** (job execution) * plus manual checks in logs ### Why changed like this Nightly Ops Report becomes the **release‑impact view** of jobs: * not just “job failed” * but **what release governance capability is now untrustworthy** (feeds/scans/reachability/evidence). ### Screen graph (Mermaid) ```mermaid flowchart TD A[Nightly Ops Report] --> B[Job Run Detail] A --> C[Scheduler Runs] A --> D[Orchestrator] A --> E[DLQ & Replays] A --> F[Integrations Detail] A --> G[Impacted Bundles/Envs] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ NIGHTLY OPS REPORT │ │ Legacy: Settings ▸ System ▸ Background Jobs + Ops Scheduler/Orchestrator (no release context) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Window: Last 24h Region: All │ │ Summary: 7 jobs OK 2 WARN 2 FAIL │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Job Schedule Last Run Status Why it matters (release impact) │ │-------------------------------------------------------------------------------------------------│ │ cve-sync-osv 02:00 02:01 OK vulnerability data freshness │ │ cve-sync-nvd 02:05 02:05 WARN NVD stale → gating confidence drops │ │ sbom-ingest-registry 02:10 02:10 OK new images get SBOM │ │ sbom-nightly-rescan 02:20 02:21 FAIL stale SBOM → approvals may block │ │ reachability-ingest-image 02:30 02:31 OK image reachability evidence │ │ reachability-ingest-runtime 02:35 02:36 WARN runtime reach coverage degraded │ │ evidence-seal-bundles 02:45 02:46 OK audit pack completion │ │-------------------------------------------------------------------------------------------------│ │ Row actions: [View Run] [Open Scheduler] [Open Orchestrator] [Open Integration] [Open DLQ] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.5 Screen — Job Run Detail (root cause + affected assets) ### Previously * Scheduler/Orchestrator showed raw execution, but not mapped to: * “affected environments” * “affected bundles” * “approvals degraded” ### Why changed like this This is the **investigation page** that bridges Ops mechanics to release decisions. ### Screen graph (Mermaid) ```mermaid flowchart TD A[Job Run Detail] --> B[Logs & traces] A --> C[Failed items list\n(images/components/envs)] A --> D[Open DLQ bucket] A --> E[Open Integration Detail] A --> F[Show impacted approvals/releases] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ Job Run Detail: sbom-nightly-rescan (Run #8841) │ │ Legacy: Scheduler/Orchestrator run detail (without release impact mapping) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Status: FAIL Started: 02:21 Ended: 02:24 Error: registry auth timeout │ │ Integration: Harbor Registry (token expired) → [Open Integration Detail] │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Affected items │ │ - 12 images not rescanned (SBOM freshness > 24h) │ │ - 3 bundle versions impacted (approvals may block) │ │ - Regions impacted: EU-West, US-East │ │ Links: [Open impacted approvals] [Open bundles] [Open DLQ bucket] [Open logs] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.6 Screen — Feeds Freshness (operator view, but tied to gating) ### Previously (where it lived) * **Operations → Feeds** (“Feed Mirror & AirGap Operations” → Sources & Freshness) * Also partially visible as “feeds” cards under Integrations. ### Why changed like this Feeds Freshness becomes a **Data Integrity subpage** because it’s primarily: * “Can we trust vulnerability data for today’s approvals?” It still links to **Feeds & AirGap Ops** for mirrors/locks (no duplication). ### Screen graph (Mermaid) ```mermaid flowchart TD A[Feeds Freshness] --> B[Feeds & AirGap Ops: Sources] A --> C[Version Locks] A --> D[Mirror Detail] A --> E[Impacted approvals\n(CVE freshness gate)] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ FEEDS FRESHNESS │ │ Legacy: Operations ▸ Feeds ▸ Sources & Freshness (and partial cards in Integrations) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Region: EU-West SLA profile: Prod (fresh < 2h) │ │ │ │ Source Status Last Sync SLA Resulting gate impact │ │----------------------------------------------------------------------------------------------- │ │ OSV OK 20m ago 6h OK │ │ NVD WARN 3h ago 2h approvals may WARN/FAIL depending baseline │ │ CISA KEV OK 3h ago 24h OK │ │ │ │ Actions: [Open Feeds & AirGap Ops] [Apply Version Lock] [Retry NVD Sync] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.7 Screen — Scan Pipeline Health (SBOM ingest + rescan + vulnerability match) ### Previously * SBOM status scattered across: * Security views (findings) * Jobs views (background jobs) * Registry integration * No single “pipeline health” page to explain staleness. ### Why changed like this You explicitly require: * “nightly SBOM re‑scan issues” * “CVE source not synced” This page shows the pipeline chain end‑to‑end and where it’s breaking. ### Screen graph (Mermaid) ```mermaid flowchart TD A[Scan Pipeline Health] --> B[SBOM ingest status] A --> C[SBOM rescan status] A --> D[CVE match status] A --> E[Open Nightly Ops Report] A --> F[Open Integrations] A --> G[Open Security findings impact] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ SCAN PIPELINE HEALTH │ │ Legacy: implied across Security + System Jobs + Registry integration │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Pipeline stages (last 24h) │ │ 1) Image discovery (registry) OK new images: 48 │ │ 2) SBOM generation/ingest OK sboms produced: 47 pending: 1 │ │ 3) Nightly SBOM rescan FAIL 12 images stale > 24h │ │ 4) CVE feeds sync WARN NVD stale 3h │ │ 5) CVE ↔ SBOM match/update WARN results may be incomplete │ │ │ │ Impact summary │ │ - Environments with “unknown SBOM freshness”: 2 (EU-West prod, APAC uat) │ │ - Approvals blocked due to missing SBOM: 1 │ │ Links: [Nightly Ops Report] [Feeds Freshness] [Integrations] [Security Findings] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.8 Screen — Reachability Ingest Health (Build / Image / Runtime) ### Previously * Reachability was referenced in approvals/security, but ingestion health wasn’t first-class. * Runtime evidence depended on agent telemetry; failures were seen indirectly. ### Why changed like this You require hybrid reachability evidence from: * **Dover image** * **build** * **running environment** This screen makes it operationally visible when one source is missing so reachability confidence is downgraded. ### Screen graph (Mermaid) ```mermaid flowchart TD A[Reachability Ingest Health] --> B[Image/Dover ingest] A --> C[Build ingest] A --> D[Runtime ingest] A --> E[Agents health] A --> F[DLQ bucket] A --> G[Impact: approvals using reachability gate] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ REACHABILITY INGEST HEALTH │ │ Legacy: implicit (Approvals/Security reachability columns) + Agent health elsewhere │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Coverage (last 24h) │ │ Image/Dover: 100% (OK) Build: 78% (WARN) Runtime: 35% (WARN) │ │ │ │ Pipelines │ │ Image/Dover ingest: OK last batch: 02:31 backlog: 0 │ │ Build ingest: WARN last batch: 01:10 backlog: 220 (CI degraded) │ │ Runtime ingest: WARN last batch: 00:55 backlog: 1,230 (agent-apac-01 degraded) │ │ │ │ Links: [Open Agents] [Open DLQ bucket] [Open impacted approvals] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.9 Screen — Integration Connectivity (data‑plane dependencies) ### Previously * **Settings → Integrations** (hub) * But release operators need a data-integrity lens: “which pipeline is broken because which connector is down?” ### Why changed like this This view is the “dependency slice” of Integrations: * still links to the canonical **Integrations Hub** for configuration, * but shows **pipeline impact** directly (feeds/scans/reachability/evidence). ### Screen graph (Mermaid) ```mermaid flowchart TD A[Integration Connectivity] --> B[Integrations Hub] A --> C[Open Integration Detail] A --> D[Show dependent jobs] A --> E[Show impacted approvals/releases] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ INTEGRATION CONNECTIVITY │ │ Legacy: Settings ▸ Integrations (no “pipeline impact” slice) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Connector Status Dependent pipelines Impact │ │----------------------------------------------------------------------------------------------- │ │ Harbor Registry WARN SBOM rescan, image discovery rescan failing │ │ Jenkins DEGRADED build reachability ingest, attestations build coverage down │ │ Vault OK env input materialization none │ │ Consul OK env config bindings none │ │ NVD Source DISCONNECTED CVE freshness approvals warn/block │ │ │ │ Actions per row: [Open Detail] [Test] [View dependent jobs] [View impacted approvals] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.10 Screen — DLQ & Replays (data pipelines stuck) ### Previously * **Operations → Dead Letter** existed, but not clearly integrated into “why approvals are unsafe.” ### Why changed like this This screen becomes the “last mile” of data integrity: * When pipelines fail, DLQ grows. * DLQ items correspond to missing SBOM updates, missing reachability evidence, failed evidence sealing. ### Screen graph (Mermaid) ```mermaid flowchart TD A[DLQ & Replays] --> B[DLQ buckets by pipeline] A --> C[Item detail + payload] A --> D[Replay item] A --> E[Open Job Run Detail] A --> F[Open Integration Detail] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ DLQ & REPLAYS │ │ Legacy: Operations ▸ Dead Letter (queue view without release impact context) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ Buckets (24h) │ │ reachability-runtime-ingest: 1,230 (agent degraded) │ │ sbom-nightly-rescan: 340 (registry auth timeout) │ │ evidence-seal-bundles: 12 (transparency log unreachable) │ │ │ │ Select bucket → items │ │ item-7781 payload: runtime-trace batch#991 age: 2h action: [Replay] [View] [Link job] │ │ item-7782 payload: runtime-trace batch#992 age: 2h action: [Replay] [View] [Link job] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- # 15.11 Screen — Data Quality SLOs (data‑SLO slice, links to System SLO Monitoring) ### Previously * **Settings/System → SLO Monitoring** (or System root in the redesigned IA) ### Why changed like this Keep SLO engine canonical under **System**, but provide a “data integrity slice” here so operators see: * feed freshness SLO * SBOM staleness SLO * runtime coverage SLO …with deep links to the full SLO view. ### Screen graph (Mermaid) ```mermaid flowchart TD A[Data Quality SLOs] --> B[System SLO Monitoring (canonical)] A --> C[Show SLO breaches that impact approvals] A --> D[Open impacted approvals/releases] ``` ### ASCII mock ```text ┌───────────────────────────────────────────────────────────────────────────────────────────────┐ │ OPERATIONS ▸ DATA INTEGRITY ▸ DATA QUALITY SLOs │ │ Legacy: System ▸ SLO Monitoring (not scoped to data integrity) │ ├───────────────────────────────────────────────────────────────────────────────────────────────┤ │ SLO Target Current Status Approval impact │ │----------------------------------------------------------------------------------------------- │ │ CVE feed freshness (NVD/OSV) <2h 3h WARN gate may warn/fail │ │ SBOM staleness (prod envs) <24h 12 stale FAIL blocks prod promotions │ │ Runtime reach coverage (prod) >50% 35% WARN reduces confidence │ │ │ │ Links: [Open System SLO Monitoring] [Open impacted approvals] │ └───────────────────────────────────────────────────────────────────────────────────────────────┘ ``` --- ## What this pack changes on other screens (without duplicating them) These are *UI hooks* (badges/banners/cards) driven by Data Integrity: * **Dashboard**: “Nightly Ops Signals” card points to **Ops → Data Integrity Overview**. * **Releases list**: “Data Health” column/badge links to **Data Integrity Overview** filtered to the region. * **Approvals**: “Ops/Data” tab links to **Data Integrity Overview** + the exact failing job/feed/DLQ bucket. * **Security Overview**: shows “feeds fresh / stale” and “SBOM freshness” badges, with a link to Data Integrity. --- If you want the next pack: **Pack 16** can update the **Dashboard** mock explicitly to add the “Nightly Ops Signals” card and the **SBOM + reachable criticals by environment** summary you requested earlier, wired directly into the Data Integrity + Security + Env Detail pages.