up
This commit is contained in:
@@ -1,602 +0,0 @@
|
||||
Here’s a simple, low‑friction way to keep priorities fresh without constant manual grooming: **let confidence decay over time**.
|
||||
|
||||
%20=%20e^{-t/τ})
|
||||
|
||||
# Exponential confidence decay (what & why)
|
||||
|
||||
* **Idea:** Every item (task, lead, bug, doc, hypothesis) has a confidence score that **automatically shrinks with time** if you don’t touch it.
|
||||
* **Formula:** `confidence(t) = e^(−t/τ)` where `t` is days since last signal (edit, comment, commit, new data), and **τ (“tau”)** is the decay constant.
|
||||
* **Rule of thumb:** With **τ = 30 days**, at **t = 30** the confidence is **e^(−1) ≈ 0.37**—about a **63% drop**. This surfaces long‑ignored items *gradually*, not with harsh “stale/expired” flips.
|
||||
|
||||
# How to use it in practice
|
||||
|
||||
* **Signals that reset t → 0:** comment on the ticket, new benchmark, fresh log sample, doc update, CI run, new market news.
|
||||
* **Sort queues by:** `priority × confidence(t)` (or severity × confidence). Quiet items drift down; truly active ones stay up.
|
||||
* **Escalation bands:**
|
||||
|
||||
* `>0.6` = green (recently touched)
|
||||
* `0.3–0.6` = amber (review soon)
|
||||
* `<0.3` = red (poke or close)
|
||||
|
||||
# Quick presets
|
||||
|
||||
* **Fast‑moving queues (incidents, hot leads):** τ = **7–14** days
|
||||
* **Engineering tasks / product docs:** τ = **30** days
|
||||
* **Research bets / roadmaps:** τ = **60–90** days
|
||||
|
||||
# For your world (Stella Ops + ops/dev work)
|
||||
|
||||
* **Vuln tickets:** `risk_score = CVSS × reachability × e^(−t/30)`
|
||||
* **Roadmap epics:** `value_score = impact × e^(−t/60)` to re‑rank quarterly.
|
||||
* **Docs:** show a badge “freshness: 42%” derived from last edit age to nudge updates.
|
||||
|
||||
# Minimal implementation sketch
|
||||
|
||||
* Store per‑item: `last_signal_at`, `base_priority`.
|
||||
* Compute on read:
|
||||
|
||||
```
|
||||
days = (now - last_signal_at).days
|
||||
conf = exp(-days / tau)
|
||||
score = base_priority * conf
|
||||
```
|
||||
* Recompute in your API layer or materialize nightly; no cron spam needed.
|
||||
|
||||
If you want, I can draft a tiny C# helper (and SQL snippet) you can drop into your issue service to add `confidence(t)` and color bands to your lists.
|
||||
Perfect, let’s turn the idea into something your devs can actually build.
|
||||
|
||||
Below is an implementation plan you can drop into a ticketing/PRD — with clear phases, data model changes, APIs, and some sample code (C# + SQL). I’ll also sprinkle in Stella Ops–specific notes.
|
||||
|
||||
---
|
||||
|
||||
## 0. Scope & Objectives
|
||||
|
||||
**Goal:** Introduce `confidence(t)` as an automatic freshness factor that decays with time and is used to rank and highlight work.
|
||||
|
||||
We’ll apply it to:
|
||||
|
||||
* Vulnerabilities (Stella Ops)
|
||||
* General issues / tasks / epics
|
||||
* (Optional) Docs, leads, hypotheses later
|
||||
|
||||
**Core behavior:**
|
||||
|
||||
* Each item has:
|
||||
|
||||
* A base priority / risk (from severity, business impact, etc.)
|
||||
* A timestamp of last signal (meaningful activity)
|
||||
* A decay rate τ (tau) in days
|
||||
* Effective priority = `base_priority × confidence(t)`
|
||||
* `confidence(t) = exp(− t / τ)` where `t` = days since last_signal
|
||||
|
||||
---
|
||||
|
||||
## 1. Data Model Changes
|
||||
|
||||
### 1.1. Add fields to core “work item” tables
|
||||
|
||||
For each relevant table (`Issues`, `Vulnerabilities`, `Epics`, …):
|
||||
|
||||
**New columns:**
|
||||
|
||||
* `base_priority` (FLOAT or INT)
|
||||
|
||||
* Example: 1–100, or derived from severity.
|
||||
* `last_signal_at` (DATETIME, NOT NULL, default = `created_at`)
|
||||
* `tau_days` (FLOAT, nullable, falls back to type default)
|
||||
* (Optional) `confidence_score_cached` (FLOAT, for materialized score)
|
||||
* (Optional) `is_confidence_frozen` (BOOL, default FALSE)
|
||||
For pinned items that should not decay.
|
||||
|
||||
**Example Postgres migration (Issues):**
|
||||
|
||||
```sql
|
||||
ALTER TABLE issues
|
||||
ADD COLUMN base_priority DOUBLE PRECISION,
|
||||
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
ADD COLUMN tau_days DOUBLE PRECISION,
|
||||
ADD COLUMN confidence_cached DOUBLE PRECISION,
|
||||
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
```
|
||||
|
||||
For Stella Ops:
|
||||
|
||||
```sql
|
||||
ALTER TABLE vulnerabilities
|
||||
ADD COLUMN base_risk DOUBLE PRECISION,
|
||||
ADD COLUMN last_signal_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
ADD COLUMN tau_days DOUBLE PRECISION,
|
||||
ADD COLUMN confidence_cached DOUBLE PRECISION,
|
||||
ADD COLUMN is_confidence_frozen BOOLEAN NOT NULL DEFAULT FALSE;
|
||||
```
|
||||
|
||||
### 1.2. Add a config table for τ per entity type
|
||||
|
||||
```sql
|
||||
CREATE TABLE confidence_decay_config (
|
||||
id SERIAL PRIMARY KEY,
|
||||
entity_type TEXT NOT NULL, -- 'issue', 'vulnerability', 'epic', 'doc'
|
||||
tau_days_default DOUBLE PRECISION NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW()
|
||||
);
|
||||
|
||||
INSERT INTO confidence_decay_config (entity_type, tau_days_default) VALUES
|
||||
('incident', 7),
|
||||
('vulnerability', 30),
|
||||
('issue', 30),
|
||||
('epic', 60),
|
||||
('doc', 90);
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 2. Define “signal” events & instrumentation
|
||||
|
||||
We need a standardized way to say: “this item got activity → reset last_signal_at”.
|
||||
|
||||
### 2.1. Signals that should reset `last_signal_at`
|
||||
|
||||
For **issues / epics:**
|
||||
|
||||
* New comment
|
||||
* Status change (e.g., Open → In Progress)
|
||||
* Field change that matters (severity, owner, milestone)
|
||||
* Attachment added
|
||||
* Link to PR added or updated
|
||||
* New CI failure linked
|
||||
|
||||
For **vulnerabilities (Stella Ops):**
|
||||
|
||||
* New scanner result attached or status updated (e.g., “Verified”, “False Positive”)
|
||||
* New evidence (PoC, exploit notes)
|
||||
* SLA override change
|
||||
* Assignment / ownership change
|
||||
* Integration events (e.g., PR merge that references the vuln)
|
||||
|
||||
For **docs (if you do it):**
|
||||
|
||||
* Any edit
|
||||
* Comment/annotation
|
||||
|
||||
### 2.2. Implement a shared helper to record a signal
|
||||
|
||||
**Service-level helper (pseudocode / C#-ish):**
|
||||
|
||||
```csharp
|
||||
public interface IConfidenceSignalService
|
||||
{
|
||||
Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null);
|
||||
}
|
||||
|
||||
public class ConfidenceSignalService : IConfidenceSignalService
|
||||
{
|
||||
private readonly IWorkItemRepository _repo;
|
||||
private readonly IConfidenceConfigService _config;
|
||||
|
||||
public async Task RecordSignalAsync(WorkItemType type, Guid itemId, DateTime? signalTimeUtc = null)
|
||||
{
|
||||
var now = signalTimeUtc ?? DateTime.UtcNow;
|
||||
var item = await _repo.GetByIdAsync(type, itemId);
|
||||
if (item == null) return;
|
||||
|
||||
item.LastSignalAt = now;
|
||||
|
||||
if (item.TauDays == null)
|
||||
{
|
||||
item.TauDays = await _config.GetDefaultTauAsync(type);
|
||||
}
|
||||
|
||||
await _repo.UpdateAsync(item);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 2.3. Wire signals into existing flows
|
||||
|
||||
Create small tasks for devs like:
|
||||
|
||||
* **ISS-01:** Call `RecordSignalAsync` on:
|
||||
|
||||
* New issue comment handler
|
||||
* Issue status update handler
|
||||
* Issue field update handler (severity/priority/owner)
|
||||
* **VULN-01:** Call `RecordSignalAsync` when:
|
||||
|
||||
* New scanner result ingested for a vuln
|
||||
* Vulnerability status, SLA, or owner changes
|
||||
* New exploit evidence is attached
|
||||
|
||||
---
|
||||
|
||||
## 3. Confidence & scoring calculation
|
||||
|
||||
### 3.1. Shared confidence function
|
||||
|
||||
Definition:
|
||||
|
||||
```csharp
|
||||
public static class ConfidenceMath
|
||||
{
|
||||
// t = days since last signal
|
||||
public static double ConfidenceScore(DateTime lastSignalAtUtc, double tauDays, DateTime? nowUtc = null)
|
||||
{
|
||||
var now = nowUtc ?? DateTime.UtcNow;
|
||||
var tDays = (now - lastSignalAtUtc).TotalDays;
|
||||
|
||||
if (tDays <= 0) return 1.0;
|
||||
if (tauDays <= 0) return 1.0; // guard / fallback
|
||||
|
||||
var score = Math.Exp(-tDays / tauDays);
|
||||
|
||||
// Optional: never drop below a tiny floor, so items never "disappear"
|
||||
const double floor = 0.01;
|
||||
return Math.Max(score, floor);
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2. Effective priority formulas
|
||||
|
||||
**Generic issues / tasks:**
|
||||
|
||||
```csharp
|
||||
double effectiveScore = issue.BasePriority * ConfidenceMath.ConfidenceScore(issue.LastSignalAt, issue.TauDays ?? defaultTau);
|
||||
```
|
||||
|
||||
**Vulnerabilities (Stella Ops):**
|
||||
|
||||
Let’s define:
|
||||
|
||||
* `severity_weight`: map CVSS or severity string to numeric (e.g. Critical=100, High=80, Medium=50, Low=20).
|
||||
* `reachability`: 0–1 (e.g. from your reachability analysis).
|
||||
* `exploitability`: 0–1 (optional, based on known exploits).
|
||||
* `confidence`: as above.
|
||||
|
||||
```csharp
|
||||
double baseRisk = severityWeight * reachability * exploitability; // or simpler: severityWeight * reachability
|
||||
double conf = ConfidenceMath.ConfidenceScore(vuln.LastSignalAt, vuln.TauDays ?? defaultTau);
|
||||
double effectiveRisk = baseRisk * conf;
|
||||
```
|
||||
|
||||
Store `baseRisk` → `vulnerabilities.base_risk`, and compute `effectiveRisk` on the fly or via job.
|
||||
|
||||
### 3.3. SQL implementation (optional for server-side sorting)
|
||||
|
||||
**Postgres example:**
|
||||
|
||||
```sql
|
||||
-- t_days = age in days
|
||||
-- tau = tau_days
|
||||
-- score = exp(-t_days / tau)
|
||||
|
||||
SELECT
|
||||
i.*,
|
||||
i.base_priority *
|
||||
GREATEST(
|
||||
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
|
||||
0.01
|
||||
) AS effective_priority
|
||||
FROM issues i
|
||||
ORDER BY effective_priority DESC;
|
||||
```
|
||||
|
||||
You can wrap that in a view:
|
||||
|
||||
```sql
|
||||
CREATE VIEW issues_with_confidence AS
|
||||
SELECT
|
||||
i.*,
|
||||
GREATEST(
|
||||
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
|
||||
0.01
|
||||
) AS confidence,
|
||||
i.base_priority *
|
||||
GREATEST(
|
||||
EXP(- EXTRACT(EPOCH FROM (NOW() - i.last_signal_at)) / (86400 * COALESCE(i.tau_days, 30))),
|
||||
0.01
|
||||
) AS effective_priority
|
||||
FROM issues i;
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Caching & performance
|
||||
|
||||
You have two options:
|
||||
|
||||
### 4.1. Compute on read (simplest to start)
|
||||
|
||||
* Use the helper function in your service layer or a DB view.
|
||||
* Pros:
|
||||
|
||||
* No jobs, always fresh.
|
||||
* Cons:
|
||||
|
||||
* Slight CPU cost on heavy lists.
|
||||
|
||||
**Plan:** Start with this. If you see perf issues, move to 4.2.
|
||||
|
||||
### 4.2. Periodic materialization job (optional later)
|
||||
|
||||
Add a scheduled job (e.g. hourly) that:
|
||||
|
||||
1. Selects all active items.
|
||||
2. Computes `confidence_score` and `effective_priority`.
|
||||
3. Writes to `confidence_cached` and `effective_priority_cached` (if you add such a column).
|
||||
|
||||
Service then sorts by cached values.
|
||||
|
||||
---
|
||||
|
||||
## 5. Backfill & migration
|
||||
|
||||
### 5.1. Initial backfill script
|
||||
|
||||
For existing records:
|
||||
|
||||
* If `last_signal_at` is NULL → set to `created_at`.
|
||||
* Derive `base_priority` / `base_risk` from existing severity fields.
|
||||
* Set `tau_days` from config.
|
||||
|
||||
**Example:**
|
||||
|
||||
```sql
|
||||
UPDATE issues
|
||||
SET last_signal_at = created_at
|
||||
WHERE last_signal_at IS NULL;
|
||||
|
||||
UPDATE issues
|
||||
SET base_priority = CASE severity
|
||||
WHEN 'critical' THEN 100
|
||||
WHEN 'high' THEN 80
|
||||
WHEN 'medium' THEN 50
|
||||
WHEN 'low' THEN 20
|
||||
ELSE 10
|
||||
END
|
||||
WHERE base_priority IS NULL;
|
||||
|
||||
UPDATE issues i
|
||||
SET tau_days = c.tau_days_default
|
||||
FROM confidence_decay_config c
|
||||
WHERE c.entity_type = 'issue'
|
||||
AND i.tau_days IS NULL;
|
||||
```
|
||||
|
||||
Do similarly for `vulnerabilities` using severity / CVSS.
|
||||
|
||||
### 5.2. Sanity checks
|
||||
|
||||
Add a small script/test to verify:
|
||||
|
||||
* Newly created items → `confidence ≈ 1.0`.
|
||||
* 30-day-old items with τ=30 → `confidence ≈ 0.37`.
|
||||
* Ordering changes when you edit/comment on items.
|
||||
|
||||
---
|
||||
|
||||
## 6. API & Query Layer
|
||||
|
||||
### 6.1. New sorting options
|
||||
|
||||
Update list APIs:
|
||||
|
||||
* Accept parameter: `sort=effective_priority` or `sort=confidence`.
|
||||
* Default sort for some views:
|
||||
|
||||
* Vulnerabilities backlog: `sort=effective_risk` (risk × confidence).
|
||||
* Issues backlog: `sort=effective_priority`.
|
||||
|
||||
**Example REST API contract:**
|
||||
|
||||
`GET /api/issues?sort=effective_priority&state=open`
|
||||
|
||||
**Response fields (additions):**
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "ISS-123",
|
||||
"title": "Fix login bug",
|
||||
"base_priority": 80,
|
||||
"last_signal_at": "2025-11-01T10:00:00Z",
|
||||
"tau_days": 30,
|
||||
"confidence": 0.63,
|
||||
"effective_priority": 50.4,
|
||||
"confidence_band": "amber"
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2. Confidence banding (for UI)
|
||||
|
||||
Define bands server-side (easy to change):
|
||||
|
||||
* Green: `confidence >= 0.6`
|
||||
* Amber: `0.3 ≤ confidence < 0.6`
|
||||
* Red: `confidence < 0.3`
|
||||
|
||||
You can compute on server:
|
||||
|
||||
```csharp
|
||||
string ConfidenceBand(double confidence) =>
|
||||
confidence >= 0.6 ? "green"
|
||||
: confidence >= 0.3 ? "amber"
|
||||
: "red";
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. UI / UX changes
|
||||
|
||||
### 7.1. List views (issues / vulns / epics)
|
||||
|
||||
For each item row:
|
||||
|
||||
* Show a small freshness pill:
|
||||
|
||||
* Text: `Active`, `Review soon`, `Stale`
|
||||
* Derived from confidence band.
|
||||
* Tooltip:
|
||||
|
||||
* “Confidence 78%. Last activity 3 days ago. τ = 30 days.”
|
||||
|
||||
* Sort default: by `effective_priority` / `effective_risk`.
|
||||
|
||||
* Filters:
|
||||
|
||||
* `Freshness: [All | Active | Review soon | Stale]`
|
||||
* Optionally: “Show stale only” toggle.
|
||||
|
||||
**Example labels:**
|
||||
|
||||
* Green: “Active (confidence 82%)”
|
||||
* Amber: “Review soon (confidence 45%)”
|
||||
* Red: “Stale (confidence 18%)”
|
||||
|
||||
### 7.2. Detail views
|
||||
|
||||
On an issue / vuln page:
|
||||
|
||||
* Add a “Confidence” section:
|
||||
|
||||
* “Confidence: **52%**”
|
||||
* “Last signal: **12 days ago**”
|
||||
* “Decay τ: **30 days**”
|
||||
* “Effective priority: **Base 80 × 0.52 = 42**”
|
||||
|
||||
* (Optional) small mini-chart (text-only or simple bar) showing approximate decay, but not necessary for first iteration.
|
||||
|
||||
### 7.3. Admin / settings UI
|
||||
|
||||
Add an internal settings page:
|
||||
|
||||
* Table of entity types with editable τ:
|
||||
|
||||
| Entity type | τ (days) | Notes |
|
||||
| ------------- | -------- | ---------------------------- |
|
||||
| Incident | 7 | Fast-moving |
|
||||
| Vulnerability | 30 | Standard risk review cadence |
|
||||
| Issue | 30 | Sprint-level decay |
|
||||
| Epic | 60 | Quarterly |
|
||||
| Doc | 90 | Slow decay |
|
||||
|
||||
* Optionally: toggle to pin item (`is_confidence_frozen`) from UI.
|
||||
|
||||
---
|
||||
|
||||
## 8. Stella Ops–specific behavior
|
||||
|
||||
For vulnerabilities:
|
||||
|
||||
### 8.1. Base risk calculation
|
||||
|
||||
Ingested fields you likely already have:
|
||||
|
||||
* `cvss_score` or `severity`
|
||||
* `reachable` (true/false or numeric)
|
||||
* (Optional) `exploit_available` (bool) or exploitability score
|
||||
* `asset_criticality` (1–5)
|
||||
|
||||
Define `base_risk` as:
|
||||
|
||||
```text
|
||||
severity_weight = f(cvss_score or severity)
|
||||
reachability = reachable ? 1.0 : 0.5 -- example
|
||||
exploitability = exploit_available ? 1.0 : 0.7
|
||||
asset_factor = 0.5 + 0.1 * asset_criticality -- 1 → 1.0, 5 → 1.5
|
||||
|
||||
base_risk = severity_weight * reachability * exploitability * asset_factor
|
||||
```
|
||||
|
||||
Store `base_risk` on vuln row.
|
||||
|
||||
Then:
|
||||
|
||||
```text
|
||||
effective_risk = base_risk * confidence(t)
|
||||
```
|
||||
|
||||
Use `effective_risk` for backlog ordering and SLAs dashboards.
|
||||
|
||||
### 8.2. Signals for vulns
|
||||
|
||||
Make sure these all call `RecordSignalAsync(Vulnerability, vulnId)`:
|
||||
|
||||
* New scan result for same vuln (re-detected).
|
||||
* Change status to “In Progress”, “Ready for Deploy”, “Verified Fixed”, etc.
|
||||
* Assigning an owner.
|
||||
* Attaching PoC / exploit details.
|
||||
|
||||
### 8.3. Vuln UI copy ideas
|
||||
|
||||
* Pill text:
|
||||
|
||||
* “Risk: 850 (confidence 68%)”
|
||||
* “Last analyst activity 11 days ago”
|
||||
|
||||
* In backlog view: show **Effective Risk** as main sort, with a smaller subtext “Base 1200 × Confidence 71%”.
|
||||
|
||||
---
|
||||
|
||||
## 9. Rollout plan
|
||||
|
||||
### Phase 1 – Infrastructure (backend-only)
|
||||
|
||||
* [ ] DB migrations & config table
|
||||
* [ ] Implement `ConfidenceMath` and helper functions
|
||||
* [ ] Implement `IConfidenceSignalService`
|
||||
* [ ] Wire signals into key flows (comments, state changes, scanner ingestion)
|
||||
* [ ] Add `confidence` and `effective_priority/risk` to API responses
|
||||
* [ ] Backfill script + dry run in staging
|
||||
|
||||
### Phase 2 – Internal UI & feature flag
|
||||
|
||||
* [ ] Add optional sorting by effective score to internal/staff views
|
||||
* [ ] Add confidence pill (hidden behind feature flag `confidence_decay_v1`)
|
||||
* [ ] Dogfood internally:
|
||||
|
||||
* Do items bubble up/down as expected?
|
||||
* Are any items “disappearing” because decay is too aggressive?
|
||||
|
||||
### Phase 3 – Parameter tuning
|
||||
|
||||
* [ ] Adjust τ per type based on feedback:
|
||||
|
||||
* If things decay too fast → increase τ
|
||||
* If queues rarely change → decrease τ
|
||||
* [ ] Decide on confidence floor (0.01? 0.05?) so nothing goes to literal 0.
|
||||
|
||||
### Phase 4 – General release
|
||||
|
||||
* [ ] Make effective score the default sort for key views:
|
||||
|
||||
* Vulnerabilities backlog
|
||||
* Issues backlog
|
||||
* [ ] Document behavior for users (help center / inline tooltip)
|
||||
* [ ] Add admin UI to tweak τ per entity type.
|
||||
|
||||
---
|
||||
|
||||
## 10. Edge cases & safeguards
|
||||
|
||||
* **New items**
|
||||
|
||||
* `last_signal_at = created_at`, confidence = 1.0.
|
||||
* **Pinned items**
|
||||
|
||||
* If `is_confidence_frozen = true` → treat confidence as 1.0.
|
||||
* **Items without τ**
|
||||
|
||||
* Always fallback to entity type default.
|
||||
* **Timezones**
|
||||
|
||||
* Always store & compute in UTC.
|
||||
* **Very old items**
|
||||
|
||||
* Floor the confidence so they’re still visible when explicitly searched.
|
||||
|
||||
---
|
||||
|
||||
If you want, I can turn this into:
|
||||
|
||||
* A short **technical design doc** (with sections: Problem, Proposal, Alternatives, Rollout).
|
||||
* Or a **set of Jira tickets** grouped by backend / frontend / infra that your team can pick up directly.
|
||||
@@ -0,0 +1,402 @@
|
||||
# CLI Developer Experience and Command UX
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, command surface design, and implementation strategy for the Stella Ops CLI, covering developer experience, CI/CD integration, output formatting, and offline operation.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Stella Ops CLI is the **primary interface for developers and CI/CD pipelines** interacting with the platform. Key capabilities:
|
||||
|
||||
- **Native AOT Binary** - Sub-20ms startup, single binary distribution
|
||||
- **DPoP-Bound Authentication** - Secure device-code and service principal flows
|
||||
- **Deterministic Outputs** - JSON/table modes with stable exit codes for CI
|
||||
- **Buildx Integration** - SBOM generation at build time
|
||||
- **Offline Kit Management** - Air-gapped deployment support
|
||||
- **Shell Completions** - Bash/Zsh/Fish/PowerShell auto-complete
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | CLI Requirements | Use Case |
|
||||
|---------|-----------------|----------|
|
||||
| **DevSecOps** | CI integration, exit codes, JSON output | Pipeline gates |
|
||||
| **Security Engineers** | Verification commands, policy testing | Audit workflows |
|
||||
| **Platform Operators** | Offline kit, admin commands | Air-gap management |
|
||||
| **Developers** | Scan commands, buildx integration | Local development |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most CLI tools in the vulnerability space are slow or lack CI ergonomics. Stella Ops differentiates with:
|
||||
- **Native AOT** for instant startup (< 20ms vs 500ms+ for JIT)
|
||||
- **Deterministic exit codes** (12 distinct codes for CI decision trees)
|
||||
- **DPoP security** (no long-lived tokens on disk)
|
||||
- **Unified command surface** (50+ commands, consistent patterns)
|
||||
- **Offline-first design** (works without network in sealed mode)
|
||||
|
||||
---
|
||||
|
||||
## 3. Command Surface Architecture
|
||||
|
||||
### 3.1 Command Categories
|
||||
|
||||
| Category | Commands | Purpose |
|
||||
|----------|----------|---------|
|
||||
| **Auth** | `login`, `logout`, `status`, `token` | Authentication management |
|
||||
| **Scan** | `scan image`, `scan fs` | Vulnerability scanning |
|
||||
| **Export** | `export sbom`, `report final` | Artifact retrieval |
|
||||
| **Verify** | `verify attestation`, `verify referrers`, `verify image-signature` | Cryptographic verification |
|
||||
| **Policy** | `policy get`, `policy set`, `policy apply` | Policy management |
|
||||
| **Buildx** | `buildx install`, `buildx verify`, `buildx build` | Build-time SBOM |
|
||||
| **Runtime** | `runtime policy test` | Zastava integration |
|
||||
| **Offline** | `offline kit pull`, `offline kit import`, `offline kit status` | Air-gap operations |
|
||||
| **Decision** | `decision export`, `decision verify`, `decision compare` | VEX evidence management |
|
||||
| **AOC** | `sources ingest`, `aoc verify` | Aggregation-only guards |
|
||||
| **KMS** | `kms export`, `kms import` | Key management |
|
||||
| **Advise** | `advise run` | AI-powered advisory summaries |
|
||||
|
||||
### 3.2 Output Modes
|
||||
|
||||
**Human Mode (default):**
|
||||
```
|
||||
$ stella scan image nginx:latest --wait
|
||||
Scanning nginx:latest...
|
||||
Found 12 vulnerabilities (2 critical, 3 high, 5 medium, 2 low)
|
||||
Policy verdict: FAIL
|
||||
|
||||
Critical:
|
||||
- CVE-2025-12345 in openssl (fixed in 3.0.14)
|
||||
- CVE-2025-12346 in libcurl (no fix available)
|
||||
|
||||
See: https://ui.internal/scans/sha256:abc123...
|
||||
```
|
||||
|
||||
**JSON Mode (`--json`):**
|
||||
```json
|
||||
{"event":"scan.complete","status":"fail","critical":2,"high":3,"medium":5,"low":2,"url":"https://..."}
|
||||
```
|
||||
|
||||
### 3.3 Exit Codes
|
||||
|
||||
| Code | Meaning | CI Action |
|
||||
|------|---------|-----------|
|
||||
| 0 | Success | Continue |
|
||||
| 2 | Policy fail | Block deployment |
|
||||
| 3 | Verification failed | Security alert |
|
||||
| 4 | Auth error | Re-authenticate |
|
||||
| 5 | Resource not found | Check inputs |
|
||||
| 6 | Rate limited | Retry with backoff |
|
||||
| 7 | Backend unavailable | Retry |
|
||||
| 9 | Invalid arguments | Fix command |
|
||||
| 11-17 | AOC guard violations | Review ingestion |
|
||||
| 18 | Verification truncated | Increase limit |
|
||||
| 70 | Transport failure | Check network |
|
||||
| 71 | Usage error | Fix command |
|
||||
|
||||
---
|
||||
|
||||
## 4. Authentication Model
|
||||
|
||||
### 4.1 Device Code Flow (Interactive)
|
||||
|
||||
```bash
|
||||
$ stella auth login
|
||||
Opening browser for authentication...
|
||||
Device code: ABCD-EFGH
|
||||
Waiting for authorization...
|
||||
Logged in as user@example.com (tenant: acme-corp)
|
||||
```
|
||||
|
||||
### 4.2 Service Principal (CI/CD)
|
||||
|
||||
```bash
|
||||
$ stella auth login --client-credentials \
|
||||
--client-id $STELLA_CLIENT_ID \
|
||||
--private-key $STELLA_PRIVATE_KEY
|
||||
```
|
||||
|
||||
### 4.3 DPoP Key Management
|
||||
|
||||
- Ephemeral Ed25519 keypair generated on first login
|
||||
- Stored in OS keychain (Keychain/DPAPI/KWallet/Gnome Keyring)
|
||||
- Every request includes DPoP proof header
|
||||
- Tokens refreshed proactively (30s before expiry)
|
||||
|
||||
### 4.4 Token Credential Helper
|
||||
|
||||
```bash
|
||||
# Get one-shot token for curl/scripts
|
||||
TOKEN=$(stella auth token --aud scanner)
|
||||
curl -H "Authorization: Bearer $TOKEN" https://scanner.internal/api/...
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Buildx Integration
|
||||
|
||||
### 5.1 Generator Installation
|
||||
|
||||
```bash
|
||||
$ stella buildx install
|
||||
Installing SBOM generator plugin...
|
||||
Verifying signature: OK
|
||||
Generator installed at ~/.docker/cli-plugins/docker-buildx-stellaops
|
||||
|
||||
$ stella buildx verify
|
||||
Docker version: 24.0.7
|
||||
Buildx version: 0.12.1
|
||||
Generator: stellaops/sbom-indexer:v1.2.3@sha256:abc123...
|
||||
Status: Ready
|
||||
```
|
||||
|
||||
### 5.2 Build with SBOM
|
||||
|
||||
```bash
|
||||
$ stella buildx build -t myapp:v1.0.0 --push --attest
|
||||
Building myapp:v1.0.0...
|
||||
SBOM generation: enabled (stellaops/sbom-indexer)
|
||||
Provenance: enabled
|
||||
Attestation: requested
|
||||
|
||||
Build complete!
|
||||
Image: myapp:v1.0.0@sha256:def456...
|
||||
SBOM: attached as referrer
|
||||
Attestation: logged to Rekor (uuid: abc123)
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Implementation Strategy
|
||||
|
||||
### 6.1 Phase 1: Core Commands (Complete)
|
||||
|
||||
- [x] Auth commands with DPoP
|
||||
- [x] Scan/export commands
|
||||
- [x] JSON output mode
|
||||
- [x] Exit code standardization
|
||||
- [x] Shell completions
|
||||
|
||||
### 6.2 Phase 2: Buildx & Verification (Complete)
|
||||
|
||||
- [x] Buildx plugin management
|
||||
- [x] Attestation verification
|
||||
- [x] Referrer verification
|
||||
- [x] Report commands
|
||||
|
||||
### 6.3 Phase 3: Advanced Features (In Progress)
|
||||
|
||||
- [x] Decision export/verify commands
|
||||
- [x] AOC guard helpers
|
||||
- [x] KMS management
|
||||
- [ ] Advisory AI integration (CLI-ADVISE-48-001)
|
||||
- [ ] Filesystem scanning (CLI-SCAN-49-001)
|
||||
|
||||
### 6.4 Phase 4: Distribution (Planned)
|
||||
|
||||
- [ ] Homebrew formula
|
||||
- [ ] Scoop/Winget manifests
|
||||
- [ ] Self-update mechanism
|
||||
- [ ] Cosign signature verification
|
||||
|
||||
---
|
||||
|
||||
## 7. CI/CD Integration Patterns
|
||||
|
||||
### 7.1 GitHub Actions
|
||||
|
||||
```yaml
|
||||
- name: Install Stella CLI
|
||||
run: |
|
||||
curl -sSL https://get.stella-ops.io | sh
|
||||
echo "$HOME/.stella/bin" >> $GITHUB_PATH
|
||||
|
||||
- name: Authenticate
|
||||
run: stella auth login --client-credentials
|
||||
env:
|
||||
STELLAOPS_CLIENT_ID: ${{ secrets.STELLA_CLIENT_ID }}
|
||||
STELLAOPS_PRIVATE_KEY: ${{ secrets.STELLA_PRIVATE_KEY }}
|
||||
|
||||
- name: Scan Image
|
||||
run: |
|
||||
stella scan image ${{ env.IMAGE_REF }} --wait --json > scan-results.json
|
||||
if [ $? -eq 2 ]; then
|
||||
echo "::error::Policy failed - blocking deployment"
|
||||
exit 1
|
||||
fi
|
||||
|
||||
- name: Verify Attestation
|
||||
run: stella verify attestation --artifact ${{ env.IMAGE_DIGEST }}
|
||||
```
|
||||
|
||||
### 7.2 GitLab CI
|
||||
|
||||
```yaml
|
||||
scan:
|
||||
script:
|
||||
- stella auth login --client-credentials
|
||||
- stella buildx install
|
||||
- docker buildx build --attest=type=sbom,generator=stellaops/sbom-indexer -t $CI_REGISTRY_IMAGE:$CI_COMMIT_SHA .
|
||||
- stella scan image $CI_REGISTRY_IMAGE@$IMAGE_DIGEST --wait --json
|
||||
artifacts:
|
||||
reports:
|
||||
container_scanning: scan-results.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Configuration Model
|
||||
|
||||
### 8.1 Precedence
|
||||
|
||||
CLI flags > Environment variables > Config file > Defaults
|
||||
|
||||
### 8.2 Config File
|
||||
|
||||
```yaml
|
||||
# ~/.config/stellaops/config.yaml
|
||||
cli:
|
||||
authority: "https://authority.example.com"
|
||||
backend:
|
||||
scanner: "https://scanner.example.com"
|
||||
attestor: "https://attestor.example.com"
|
||||
auth:
|
||||
deviceCode: true
|
||||
audienceDefault: "scanner"
|
||||
output:
|
||||
json: false
|
||||
color: auto
|
||||
tls:
|
||||
caBundle: "/etc/ssl/certs/ca-bundle.crt"
|
||||
offline:
|
||||
kitMirror: "s3://mirror/stellaops-kit"
|
||||
```
|
||||
|
||||
### 8.3 Environment Variables
|
||||
|
||||
| Variable | Purpose |
|
||||
|----------|---------|
|
||||
| `STELLAOPS_AUTHORITY` | Authority URL |
|
||||
| `STELLAOPS_SCANNER_URL` | Scanner service URL |
|
||||
| `STELLAOPS_CLIENT_ID` | Service principal ID |
|
||||
| `STELLAOPS_PRIVATE_KEY` | Service principal key |
|
||||
| `STELLAOPS_TENANT` | Default tenant |
|
||||
| `STELLAOPS_JSON` | Enable JSON output |
|
||||
|
||||
---
|
||||
|
||||
## 9. Offline Operation
|
||||
|
||||
### 9.1 Sealed Mode Detection
|
||||
|
||||
```bash
|
||||
$ stella scan image nginx:latest
|
||||
Error: Sealed mode active - external network access blocked
|
||||
Remediation: Import offline kit or disable sealed mode
|
||||
|
||||
$ stella offline kit import latest-kit.tar.gz
|
||||
Importing offline kit...
|
||||
Advisories: 45,230 records
|
||||
VEX documents: 12,450 records
|
||||
Policy packs: 3 bundles
|
||||
Import complete!
|
||||
|
||||
$ stella scan image nginx:latest
|
||||
Scanning with offline data (2025-11-28)...
|
||||
```
|
||||
|
||||
### 9.2 Air-Gap Guard
|
||||
|
||||
All HTTP flows route through `StellaOps.AirGap.Policy`. When sealed mode is active:
|
||||
- External egress is blocked with `AIRGAP_EGRESS_BLOCKED` error
|
||||
- CLI provides clear remediation guidance
|
||||
- Local verification continues to work
|
||||
|
||||
---
|
||||
|
||||
## 10. Security Considerations
|
||||
|
||||
### 10.1 Credential Protection
|
||||
|
||||
- DPoP private keys stored in OS keychain only
|
||||
- No plaintext tokens on disk
|
||||
- Short-lived OpToks held in memory only
|
||||
- Authorization headers redacted from verbose logs
|
||||
|
||||
### 10.2 Binary Verification
|
||||
|
||||
```bash
|
||||
# Verify CLI binary signature
|
||||
$ stella version --verify
|
||||
Version: 1.2.3
|
||||
Built: 2025-11-29T12:00:00Z
|
||||
Signature: Valid (cosign)
|
||||
Signer: release@stella-ops.io
|
||||
```
|
||||
|
||||
### 10.3 Hard Lines
|
||||
|
||||
- Refuse to print token values
|
||||
- Disallow `--insecure` without explicit env var opt-in
|
||||
- Enforce short token TTL with proactive refresh
|
||||
- Device-code cache bound to machine + user
|
||||
|
||||
---
|
||||
|
||||
## 11. Performance Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Startup time | < 20ms (AOT) |
|
||||
| Request overhead | < 5ms |
|
||||
| Large download (100MB) | > 80 MB/s |
|
||||
| Buildx wrapper overhead | < 1ms |
|
||||
|
||||
---
|
||||
|
||||
## 12. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| CLI architecture | `docs/modules/cli/architecture.md` |
|
||||
| Policy CLI guide | `docs/modules/cli/guides/policy.md` |
|
||||
| API/CLI reference | `docs/09_API_CLI_REFERENCE.md` |
|
||||
| Offline operation | `docs/24_OFFLINE_KIT.md` |
|
||||
|
||||
---
|
||||
|
||||
## 13. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0400_cli_ux.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_210_ui_ii.md (UI integration)
|
||||
- SPRINT_0187_0001_0001_evidence_locker_cli_integration.md (Evidence CLI)
|
||||
|
||||
**Key Task IDs:**
|
||||
- `CLI-AUTH-10-001` - DPoP authentication (DONE)
|
||||
- `CLI-SCAN-20-001` - Scan commands (DONE)
|
||||
- `CLI-BUILDX-30-001` - Buildx integration (DONE)
|
||||
- `CLI-ADVISE-48-001` - Advisory AI commands (IN PROGRESS)
|
||||
- `CLI-SCAN-49-001` - Filesystem scanning (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 14. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Startup latency | < 20ms p99 |
|
||||
| CI adoption | 80% of pipelines use CLI |
|
||||
| Exit code coverage | 100% of failure modes |
|
||||
| Shell completion coverage | 100% of commands |
|
||||
| Offline operation success | Works without network |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,476 @@
|
||||
# Concelier Advisory Ingestion Model
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, ingestion semantics, and implementation strategy for the Concelier module, covering the Link-Not-Merge model, connector pipelines, observation storage, and deterministic exports.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Concelier is the **advisory ingestion engine** that acquires, normalizes, and correlates vulnerability advisories from authoritative sources. Key capabilities:
|
||||
|
||||
- **Aggregation-Only Contract** - No derived semantics in ingestion
|
||||
- **Link-Not-Merge** - Observations correlated, never merged
|
||||
- **Multi-Source Connectors** - Vendor PSIRTs, distros, OSS ecosystems
|
||||
- **Deterministic Exports** - Reproducible JSON, Trivy DB bundles
|
||||
- **Conflict Detection** - Structured payloads for divergent claims
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Ingestion Requirements | Use Case |
|
||||
|---------|------------------------|----------|
|
||||
| **Security Teams** | Authoritative data | Accurate vulnerability assessment |
|
||||
| **Compliance** | Provenance tracking | Audit trail for advisory sources |
|
||||
| **DevSecOps** | Fast updates | CI/CD pipeline integration |
|
||||
| **Air-Gap Ops** | Offline bundles | Disconnected environment support |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability databases merge data, losing provenance. Stella Ops differentiates with:
|
||||
- **Link-Not-Merge** preserving all source claims
|
||||
- **Conflict visibility** showing where sources disagree
|
||||
- **Deterministic exports** enabling reproducible builds
|
||||
- **Multi-format support** (CSAF, OSV, GHSA, vendor-specific)
|
||||
- **Signature verification** for upstream integrity
|
||||
|
||||
---
|
||||
|
||||
## 3. Aggregation-Only Contract (AOC)
|
||||
|
||||
### 3.1 Core Principles
|
||||
|
||||
The AOC ensures ingestion purity:
|
||||
|
||||
1. **No derived semantics** - No severity consensus, merged status, or fix hints
|
||||
2. **Immutable raw docs** - Append-only with version chains
|
||||
3. **Mandatory provenance** - Source, timestamp, signature status
|
||||
4. **Linkset only** - Joins stored separately, never mutate content
|
||||
5. **Deterministic canonicalization** - Stable JSON output
|
||||
6. **Idempotent upserts** - Same hash = no new record
|
||||
7. **CI verification** - AOCVerifier enforces at runtime
|
||||
|
||||
### 3.2 Enforcement
|
||||
|
||||
```csharp
|
||||
// AOCWriteGuard checks before every write
|
||||
public class AOCWriteGuard
|
||||
{
|
||||
Task GuardAsync(AdvisoryObservation obs)
|
||||
{
|
||||
// Verify no forbidden properties
|
||||
// Validate provenance completeness
|
||||
// Check tenant claims
|
||||
// Normalize timestamps
|
||||
// Compute content hash
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
Roslyn analyzers (`StellaOps.AOC.Analyzers`) scan connectors at build time to prevent forbidden property usage.
|
||||
|
||||
---
|
||||
|
||||
## 4. Advisory Observation Model
|
||||
|
||||
### 4.1 Observation Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "tenant:vendor:upstreamId:revision",
|
||||
"tenant": "acme-corp",
|
||||
"source": {
|
||||
"vendor": "OSV",
|
||||
"stream": "github",
|
||||
"api": "https://api.osv.dev/v1/.../GHSA-...",
|
||||
"collectorVersion": "concelier/1.7.3"
|
||||
},
|
||||
"upstream": {
|
||||
"upstreamId": "GHSA-xxxx-....",
|
||||
"documentVersion": "2025-09-01T12:13:14Z",
|
||||
"fetchedAt": "2025-09-01T13:04:05Z",
|
||||
"receivedAt": "2025-09-01T13:04:06Z",
|
||||
"contentHash": "sha256:...",
|
||||
"signature": {
|
||||
"present": true,
|
||||
"format": "dsse",
|
||||
"keyId": "rekor:.../key/abc"
|
||||
}
|
||||
},
|
||||
"content": {
|
||||
"format": "OSV",
|
||||
"specVersion": "1.6",
|
||||
"raw": { /* unmodified upstream document */ }
|
||||
},
|
||||
"identifiers": {
|
||||
"primary": "GHSA-xxxx-....",
|
||||
"aliases": ["CVE-2025-12345", "GHSA-xxxx-...."]
|
||||
},
|
||||
"linkset": {
|
||||
"purls": ["pkg:npm/lodash@4.17.21"],
|
||||
"cpes": ["cpe:2.3:a:lodash:lodash:4.17.21:*:*:*:*:*:*:*"],
|
||||
"references": [
|
||||
{"type": "advisory", "url": "https://..."},
|
||||
{"type": "fix", "url": "https://..."}
|
||||
]
|
||||
},
|
||||
"supersedes": "tenant:vendor:upstreamId:prev-revision",
|
||||
"createdAt": "2025-09-01T13:04:06Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 Linkset Correlation
|
||||
|
||||
```json
|
||||
{
|
||||
"_id": "sha256:...",
|
||||
"tenant": "acme-corp",
|
||||
"key": {
|
||||
"vulnerabilityId": "CVE-2025-12345",
|
||||
"productKey": "pkg:npm/lodash@4.17.21",
|
||||
"confidence": "high"
|
||||
},
|
||||
"observations": [
|
||||
{
|
||||
"observationId": "tenant:osv:GHSA-...:v1",
|
||||
"sourceVendor": "OSV",
|
||||
"statement": { "severity": "high" },
|
||||
"collectedAt": "2025-09-01T13:04:06Z"
|
||||
},
|
||||
{
|
||||
"observationId": "tenant:nvd:CVE-2025-12345:v2",
|
||||
"sourceVendor": "NVD",
|
||||
"statement": { "severity": "critical" },
|
||||
"collectedAt": "2025-09-01T14:00:00Z"
|
||||
}
|
||||
],
|
||||
"conflicts": [
|
||||
{
|
||||
"conflictId": "sha256:...",
|
||||
"type": "severity-mismatch",
|
||||
"observations": [
|
||||
{ "source": "OSV", "value": "high" },
|
||||
{ "source": "NVD", "value": "critical" }
|
||||
],
|
||||
"confidence": "medium",
|
||||
"detectedAt": "2025-09-01T14:00:01Z"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Source Connectors
|
||||
|
||||
### 5.1 Source Families
|
||||
|
||||
| Family | Examples | Format |
|
||||
|--------|----------|--------|
|
||||
| **Vendor PSIRTs** | Microsoft, Oracle, Cisco, Adobe | CSAF, proprietary |
|
||||
| **Linux Distros** | Red Hat, SUSE, Ubuntu, Debian, Alpine | CSAF, JSON, XML |
|
||||
| **OSS Ecosystems** | OSV, GHSA, npm, PyPI, Maven | OSV, GraphQL |
|
||||
| **CERTs** | CISA (KEV), JVN, CERT-FR | JSON, XML |
|
||||
|
||||
### 5.2 Connector Contract
|
||||
|
||||
```csharp
|
||||
public interface IFeedConnector
|
||||
{
|
||||
string SourceName { get; }
|
||||
|
||||
// Fetch signed feeds or offline mirrors
|
||||
Task FetchAsync(IServiceProvider sp, CancellationToken ct);
|
||||
|
||||
// Normalize to strongly-typed DTOs
|
||||
Task ParseAsync(IServiceProvider sp, CancellationToken ct);
|
||||
|
||||
// Build canonical records with provenance
|
||||
Task MapAsync(IServiceProvider sp, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Connector Lifecycle
|
||||
|
||||
1. **Snapshot** - Fetch with cursor, ETag, rate limiting
|
||||
2. **Parse** - Schema validation, normalization
|
||||
3. **Guard** - AOCWriteGuard enforcement
|
||||
4. **Write** - Append-only insert
|
||||
5. **Event** - Emit `advisory.observation.updated`
|
||||
|
||||
---
|
||||
|
||||
## 6. Version Semantics
|
||||
|
||||
### 6.1 Ecosystem Normalization
|
||||
|
||||
| Ecosystem | Format | Normalization |
|
||||
|-----------|--------|---------------|
|
||||
| npm, PyPI, Maven | SemVer | Intervals with `<`, `>=`, `~`, `^` |
|
||||
| RPM | EVR | `epoch:version-release` with order keys |
|
||||
| DEB | dpkg | Version comparison with order keys |
|
||||
| APK | Alpine | Computed order keys |
|
||||
|
||||
### 6.2 CVSS Handling
|
||||
|
||||
- Normalize CVSS v2/v3/v4 where available
|
||||
- Track all source CVSS values
|
||||
- Effective severity = max (configurable)
|
||||
- Store KEV evidence with source and date
|
||||
|
||||
---
|
||||
|
||||
## 7. Conflict Detection
|
||||
|
||||
### 7.1 Conflict Types
|
||||
|
||||
| Type | Description | Resolution |
|
||||
|------|-------------|------------|
|
||||
| `severity-mismatch` | Different severity ratings | Policy decides |
|
||||
| `affected-range-divergence` | Different version ranges | Most specific wins |
|
||||
| `reference-clash` | Contradictory references | Surface all |
|
||||
| `alias-inconsistency` | Different alias mappings | Union with provenance |
|
||||
| `metadata-gap` | Missing information | Flag for review |
|
||||
|
||||
### 7.2 Conflict Visibility
|
||||
|
||||
Conflicts are never hidden - they are:
|
||||
- Stored in linkset documents
|
||||
- Surfaced in API responses
|
||||
- Included in exports
|
||||
- Displayed in Console UI
|
||||
|
||||
---
|
||||
|
||||
## 8. Deterministic Exports
|
||||
|
||||
### 8.1 JSON Export
|
||||
|
||||
```
|
||||
exports/json/
|
||||
├── CVE/
|
||||
│ ├── 20/
|
||||
│ │ └── CVE-2025-12345.json
|
||||
│ └── ...
|
||||
├── manifest.json
|
||||
└── export-digest.sha256
|
||||
```
|
||||
|
||||
- Deterministic folder structure
|
||||
- Canonical JSON (sorted keys, stable timestamps)
|
||||
- Manifest with SHA-256 per file
|
||||
- Reproducible across runs
|
||||
|
||||
### 8.2 Trivy DB Export
|
||||
|
||||
```
|
||||
exports/trivy/
|
||||
├── db.tar.gz
|
||||
├── metadata.json
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
- Bolt DB compatible with Trivy
|
||||
- Full and delta modes
|
||||
- ORAS push to registries
|
||||
- Mirror manifests for domains
|
||||
|
||||
### 8.3 Export Determinism
|
||||
|
||||
Running the same export against the same data must produce:
|
||||
- Identical file contents
|
||||
- Identical manifest hashes
|
||||
- Identical export digests
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Strategy
|
||||
|
||||
### 9.1 Phase 1: Core Pipeline (Complete)
|
||||
|
||||
- [x] AOCWriteGuard implementation
|
||||
- [x] Observation storage
|
||||
- [x] Basic connectors (Red Hat, SUSE, OSV)
|
||||
- [x] JSON export
|
||||
|
||||
### 9.2 Phase 2: Link-Not-Merge (Complete)
|
||||
|
||||
- [x] Linkset correlation engine
|
||||
- [x] Conflict detection
|
||||
- [x] Event emission
|
||||
- [x] API surface
|
||||
|
||||
### 9.3 Phase 3: Expanded Sources (In Progress)
|
||||
|
||||
- [x] GHSA GraphQL connector
|
||||
- [x] Debian DSA connector
|
||||
- [ ] Alpine secdb connector (CONCELIER-CONN-50-001)
|
||||
- [ ] CISA KEV enrichment (CONCELIER-KEV-51-001)
|
||||
|
||||
### 9.4 Phase 4: Export Enhancements (Planned)
|
||||
|
||||
- [ ] Delta Trivy DB exports
|
||||
- [ ] ORAS registry push
|
||||
- [ ] Attestation hand-off
|
||||
- [ ] Mirror bundle signing
|
||||
|
||||
---
|
||||
|
||||
## 10. API Surface
|
||||
|
||||
### 10.1 Sources & Jobs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/concelier/sources` | GET | `concelier.read` | List sources |
|
||||
| `/api/v1/concelier/sources/{name}/trigger` | POST | `concelier.admin` | Trigger fetch |
|
||||
| `/api/v1/concelier/sources/{name}/pause` | POST | `concelier.admin` | Pause source |
|
||||
| `/api/v1/concelier/jobs/{id}` | GET | `concelier.read` | Job status |
|
||||
|
||||
### 10.2 Exports
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/concelier/exports/json` | POST | `concelier.export` | Trigger JSON export |
|
||||
| `/api/v1/concelier/exports/trivy` | POST | `concelier.export` | Trigger Trivy export |
|
||||
| `/api/v1/concelier/exports/{id}` | GET | `concelier.read` | Export status |
|
||||
|
||||
### 10.3 Search
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/concelier/advisories/{key}` | GET | `concelier.read` | Get advisory |
|
||||
| `/api/v1/concelier/observations/{id}` | GET | `concelier.read` | Get observation |
|
||||
| `/api/v1/concelier/linksets` | GET | `concelier.read` | Query linksets |
|
||||
|
||||
---
|
||||
|
||||
## 11. Storage Model
|
||||
|
||||
### 11.1 Collections
|
||||
|
||||
| Collection | Purpose | Key Indexes |
|
||||
|------------|---------|-------------|
|
||||
| `sources` | Connector catalog | `{_id}` |
|
||||
| `source_state` | Run state | `{sourceName}` |
|
||||
| `documents` | Raw payloads | `{sourceName, uri}` |
|
||||
| `advisory_observations` | Normalized records | `{tenant, upstream.upstreamId}` |
|
||||
| `advisory_linksets` | Correlations | `{tenant, key.vulnerabilityId, key.productKey}` |
|
||||
| `advisory_events` | Change log | `{type, occurredAt}` |
|
||||
| `export_state` | Export cursors | `{exportKind}` |
|
||||
|
||||
### 11.2 GridFS Buckets
|
||||
|
||||
- `fs.documents` - Raw payloads (immutable)
|
||||
- `fs.exports` - Historical archives
|
||||
|
||||
---
|
||||
|
||||
## 12. Event Model
|
||||
|
||||
### 12.1 Events
|
||||
|
||||
| Event | Trigger | Content |
|
||||
|-------|---------|---------|
|
||||
| `advisory.observation.updated@1` | New/superseded observation | IDs, hash, supersedes |
|
||||
| `advisory.linkset.updated@1` | Correlation change | Deltas, conflicts |
|
||||
|
||||
### 12.2 Event Transport
|
||||
|
||||
- Primary: NATS
|
||||
- Fallback: Redis Stream
|
||||
- Offline Kit captures for replay
|
||||
|
||||
---
|
||||
|
||||
## 13. Observability
|
||||
|
||||
### 13.1 Metrics
|
||||
|
||||
- `concelier.fetch.docs_total{source}`
|
||||
- `concelier.fetch.bytes_total{source}`
|
||||
- `concelier.parse.failures_total{source}`
|
||||
- `concelier.observations.write_total{result}`
|
||||
- `concelier.linksets.updated_total{result}`
|
||||
- `concelier.linksets.conflicts_total{type}`
|
||||
- `concelier.export.duration_seconds{kind}`
|
||||
|
||||
### 13.2 Performance Targets
|
||||
|
||||
| Operation | Target |
|
||||
|-----------|--------|
|
||||
| Ingest throughput | 5k docs/min |
|
||||
| Observation write | < 5ms p95 |
|
||||
| Linkset build | < 15ms p95 |
|
||||
| Export (1M advisories) | < 90 seconds |
|
||||
|
||||
---
|
||||
|
||||
## 14. Security Considerations
|
||||
|
||||
### 14.1 Outbound Security
|
||||
|
||||
- Allowlist per connector (domains, protocols)
|
||||
- Proxy support with TLS pinning
|
||||
- Rate limiting per source
|
||||
|
||||
### 14.2 Signature Verification
|
||||
|
||||
- PGP/cosign/x509 verification stored
|
||||
- Failed verification flagged, not rejected
|
||||
- Policy can down-weight unsigned sources
|
||||
|
||||
### 14.3 Determinism
|
||||
|
||||
- Canonical JSON writer
|
||||
- Stable export digests
|
||||
- Reproducible across runs
|
||||
|
||||
---
|
||||
|
||||
## 15. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Concelier architecture | `docs/modules/concelier/architecture.md` |
|
||||
| Link-Not-Merge schema | `docs/modules/concelier/link-not-merge-schema.md` |
|
||||
| Event schemas | `docs/modules/concelier/events/` |
|
||||
| Attestation guide | `docs/modules/concelier/attestation.md` |
|
||||
|
||||
---
|
||||
|
||||
## 16. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0115_0001_0004_concelier_iv.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0113_0001_0002_concelier_ii.md
|
||||
- SPRINT_0114_0001_0003_concelier_iii.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `CONCELIER-AOC-40-001` - AOC enforcement (DONE)
|
||||
- `CONCELIER-LNM-41-001` - Link-Not-Merge (DONE)
|
||||
- `CONCELIER-CONN-50-001` - Alpine connector (IN PROGRESS)
|
||||
- `CONCELIER-KEV-51-001` - KEV enrichment (TODO)
|
||||
- `CONCELIER-EXPORT-55-001` - Delta exports (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 17. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Advisory freshness | < 1 hour from source |
|
||||
| Ingestion accuracy | 100% provenance retention |
|
||||
| Export determinism | 100% hash reproducibility |
|
||||
| Conflict detection | 100% of source divergence |
|
||||
| Source coverage | 20+ authoritative sources |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,449 @@
|
||||
# Export Center and Reporting Strategy
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, profile system, and implementation strategy for the Export Center module, covering bundle generation, adapter architecture, distribution channels, and compliance reporting.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Export Center is the **dedicated service layer for packaging reproducible evidence bundles**. Key capabilities:
|
||||
|
||||
- **Profile-Based Exports** - 6+ profile types (JSON, Trivy, Mirror, DevPortal)
|
||||
- **Deterministic Bundles** - Bit-for-bit reproducible outputs with DSSE signatures
|
||||
- **Multi-Format Adapters** - Pluggable adapters for different consumer needs
|
||||
- **Distribution Channels** - HTTP download, OCI push, object storage
|
||||
- **Compliance Ready** - Provenance, signatures, audit trails for SOC 2/FedRAMP
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Export Requirements | Use Case |
|
||||
|---------|---------------------|----------|
|
||||
| **Compliance Teams** | Signed bundles, provenance | Audit evidence |
|
||||
| **Security Vendors** | Trivy DB format | Scanner integration |
|
||||
| **Air-Gap Operators** | Offline mirrors | Disconnected environments |
|
||||
| **Development Teams** | JSON exports | CI/CD integration |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability platforms offer basic CSV/JSON exports. Stella Ops differentiates with:
|
||||
- **Reproducible bundles** with cryptographic verification
|
||||
- **Multi-format adapters** (Trivy, CycloneDX, SPDX, custom)
|
||||
- **OCI distribution** for container-native workflows
|
||||
- **Provenance attestations** meeting SLSA Level 2+
|
||||
- **Delta exports** for bandwidth-efficient updates
|
||||
|
||||
---
|
||||
|
||||
## 3. Profile System
|
||||
|
||||
### 3.1 Built-in Profiles
|
||||
|
||||
| Profile | Variant | Description | Output Format |
|
||||
|---------|---------|-------------|---------------|
|
||||
| **JSON** | `raw` | Unprocessed advisory/VEX data | `.jsonl.zst` |
|
||||
| **JSON** | `policy` | Policy-evaluated findings | `.jsonl.zst` |
|
||||
| **Trivy** | `db` | Trivy vulnerability database | SQLite |
|
||||
| **Trivy** | `java-db` | Trivy Java advisory database | SQLite |
|
||||
| **Mirror** | `full` | Complete offline mirror | Filesystem tree |
|
||||
| **Mirror** | `delta` | Incremental updates | Filesystem tree |
|
||||
| **DevPortal** | `offline` | Developer portal assets | Archive |
|
||||
|
||||
### 3.2 Profile Configuration
|
||||
|
||||
```yaml
|
||||
apiVersion: stellaops.io/export.v1
|
||||
kind: ExportProfile
|
||||
metadata:
|
||||
name: compliance-report-monthly
|
||||
tenant: acme-corp
|
||||
|
||||
spec:
|
||||
kind: json
|
||||
variant: policy
|
||||
schedule: "0 0 1 * *" # Monthly
|
||||
|
||||
selectors:
|
||||
tenants: ["acme-corp"]
|
||||
timeWindow: "30d"
|
||||
severities: ["critical", "high"]
|
||||
ecosystems: ["npm", "maven", "pypi"]
|
||||
|
||||
options:
|
||||
compression: zstd
|
||||
encryption:
|
||||
enabled: true
|
||||
recipients: ["age1..."]
|
||||
signing:
|
||||
enabled: true
|
||||
keyRef: "kms://acme-corp/export-signing-key"
|
||||
|
||||
distribution:
|
||||
- type: http
|
||||
retention: 90d
|
||||
- type: oci
|
||||
registry: "registry.acme.com/exports"
|
||||
repository: "compliance-reports"
|
||||
```
|
||||
|
||||
### 3.3 Selector Expressions
|
||||
|
||||
| Selector | Description | Example |
|
||||
|----------|-------------|---------|
|
||||
| `tenants` | Tenant filter | `["acme-*"]` |
|
||||
| `timeWindow` | Time range | `"30d"`, `"2025-01-01/2025-12-31"` |
|
||||
| `products` | Product PURLs | `["pkg:npm/*", "pkg:maven/org.apache/*"]` |
|
||||
| `severities` | Severity filter | `["critical", "high"]` |
|
||||
| `ecosystems` | Package ecosystems | `["npm", "maven"]` |
|
||||
| `policyVersions` | Policy snapshot IDs | `["rev-42", "rev-43"]` |
|
||||
|
||||
---
|
||||
|
||||
## 4. Adapter Architecture
|
||||
|
||||
### 4.1 Adapter Contract
|
||||
|
||||
```csharp
|
||||
public interface IExportAdapter
|
||||
{
|
||||
string Kind { get; } // "json" | "trivy" | "mirror"
|
||||
string Variant { get; } // "raw" | "policy" | "db"
|
||||
|
||||
Task<ExportResult> RunAsync(
|
||||
ExportContext context,
|
||||
IAsyncEnumerable<ExportRecord> records,
|
||||
CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
### 4.2 JSON Adapter
|
||||
|
||||
**Responsibilities:**
|
||||
- Canonical JSON serialization (sorted keys, RFC3339 UTC)
|
||||
- Linkset preservation for traceability
|
||||
- Zstandard compression
|
||||
- AOC guardrails (no derived modifications to raw fields)
|
||||
|
||||
**Output:**
|
||||
```
|
||||
export/
|
||||
├── advisories.jsonl.zst
|
||||
├── vex-statements.jsonl.zst
|
||||
├── findings.jsonl.zst (policy variant)
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
### 4.3 Trivy Adapter
|
||||
|
||||
**Responsibilities:**
|
||||
- Map Stella Ops advisory schema to Trivy DB format
|
||||
- Handle namespace collisions across ecosystems
|
||||
- Validate against supported Trivy schema versions
|
||||
- Generate severity distribution summary
|
||||
|
||||
**Compatibility:**
|
||||
- Trivy DB schema v2 (current)
|
||||
- Fail-fast on unsupported schema versions
|
||||
|
||||
### 4.4 Mirror Adapter
|
||||
|
||||
**Responsibilities:**
|
||||
- Build self-contained filesystem layout
|
||||
- Delta comparison against base manifest
|
||||
- Optional encryption of `/data` subtree
|
||||
- OCI layer generation
|
||||
|
||||
**Layout:**
|
||||
```
|
||||
mirror/
|
||||
├── manifests/
|
||||
│ ├── advisories.manifest.json
|
||||
│ └── vex.manifest.json
|
||||
├── data/
|
||||
│ ├── raw/
|
||||
│ │ ├── advisories/
|
||||
│ │ └── vex/
|
||||
│ └── policy/
|
||||
│ └── findings/
|
||||
├── indexes/
|
||||
│ └── by-cve.index
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Bundle Structure
|
||||
|
||||
### 5.1 Export Manifest
|
||||
|
||||
```json
|
||||
{
|
||||
"version": "1.0.0",
|
||||
"exportId": "export-20251129-001",
|
||||
"profile": {
|
||||
"kind": "json",
|
||||
"variant": "policy",
|
||||
"name": "compliance-report-monthly"
|
||||
},
|
||||
"tenant": "acme-corp",
|
||||
"generatedAt": "2025-11-29T12:00:00Z",
|
||||
"generatedBy": "export-center-worker-1",
|
||||
"selectors": {
|
||||
"timeWindow": "2025-11-01/2025-11-30",
|
||||
"severities": ["critical", "high"]
|
||||
},
|
||||
"contents": [
|
||||
{
|
||||
"path": "findings.jsonl.zst",
|
||||
"size": 1048576,
|
||||
"digest": "sha256:abc123...",
|
||||
"recordCount": 45230
|
||||
}
|
||||
],
|
||||
"totals": {
|
||||
"advisories": 45230,
|
||||
"vexStatements": 12450,
|
||||
"findings": 8920
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Provenance Attestation
|
||||
|
||||
```json
|
||||
{
|
||||
"predicateType": "https://slsa.dev/provenance/v1",
|
||||
"subject": [
|
||||
{
|
||||
"name": "export-20251129-001.tar.gz",
|
||||
"digest": { "sha256": "def456..." }
|
||||
}
|
||||
],
|
||||
"predicate": {
|
||||
"buildDefinition": {
|
||||
"buildType": "https://stellaops.io/export/v1",
|
||||
"externalParameters": {
|
||||
"profile": "compliance-report-monthly",
|
||||
"selectors": { "...": "..." }
|
||||
}
|
||||
},
|
||||
"runDetails": {
|
||||
"builder": {
|
||||
"id": "https://stellaops.io/export-center",
|
||||
"version": "1.2.3"
|
||||
},
|
||||
"metadata": {
|
||||
"invocationId": "export-run-123",
|
||||
"startedOn": "2025-11-29T12:00:00Z",
|
||||
"finishedOn": "2025-11-29T12:05:00Z"
|
||||
}
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Distribution Channels
|
||||
|
||||
### 6.1 HTTP Download
|
||||
|
||||
```bash
|
||||
# Download bundle
|
||||
curl -H "Authorization: Bearer $TOKEN" \
|
||||
"https://export.stellaops.io/api/export/runs/{id}/download" \
|
||||
-o export-bundle.tar.gz
|
||||
|
||||
# Verify signature
|
||||
cosign verify-blob --key export-key.pub \
|
||||
--signature export-bundle.sig \
|
||||
export-bundle.tar.gz
|
||||
```
|
||||
|
||||
**Features:**
|
||||
- Chunked transfer encoding
|
||||
- Range request support (resumable)
|
||||
- `X-Export-Digest` header
|
||||
- Optional encryption metadata
|
||||
|
||||
### 6.2 OCI Push
|
||||
|
||||
```bash
|
||||
# Pull from registry
|
||||
oras pull registry.example.com/exports/compliance:2025-11
|
||||
|
||||
# Verify annotations
|
||||
oras manifest fetch registry.example.com/exports/compliance:2025-11 | jq
|
||||
```
|
||||
|
||||
**Annotations:**
|
||||
- `io.stellaops.export.profile`
|
||||
- `io.stellaops.export.tenant`
|
||||
- `io.stellaops.export.manifest-digest`
|
||||
- `io.stellaops.export.provenance-ref`
|
||||
|
||||
### 6.3 Object Storage
|
||||
|
||||
```yaml
|
||||
distribution:
|
||||
- type: object
|
||||
provider: s3
|
||||
bucket: stella-exports
|
||||
prefix: "${tenant}/${exportId}"
|
||||
retention: 365d
|
||||
immutable: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Strategy
|
||||
|
||||
### 7.1 Phase 1: Core Infrastructure (Complete)
|
||||
|
||||
- [x] Profile CRUD APIs
|
||||
- [x] JSON adapter (raw, policy)
|
||||
- [x] HTTP download distribution
|
||||
- [x] Manifest generation
|
||||
|
||||
### 7.2 Phase 2: Trivy Integration (Complete)
|
||||
|
||||
- [x] Trivy DB adapter
|
||||
- [x] Trivy Java DB adapter
|
||||
- [x] Schema version validation
|
||||
- [x] Compatibility testing
|
||||
|
||||
### 7.3 Phase 3: Mirror & Distribution (In Progress)
|
||||
|
||||
- [x] Mirror full adapter
|
||||
- [x] Mirror delta adapter
|
||||
- [ ] OCI push distribution (EXPORT-OCI-45-001)
|
||||
- [ ] DevPortal adapter (EXPORT-DEV-46-001)
|
||||
|
||||
### 7.4 Phase 4: Advanced Features (Planned)
|
||||
|
||||
- [ ] Encryption at rest
|
||||
- [ ] Scheduled exports
|
||||
- [ ] Retention policies
|
||||
- [ ] Cross-tenant exports (with approval)
|
||||
|
||||
---
|
||||
|
||||
## 8. API Surface
|
||||
|
||||
### 8.1 Profile Management
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/export/profiles` | GET | `export:read` | List profiles |
|
||||
| `/api/export/profiles` | POST | `export:profile:manage` | Create profile |
|
||||
| `/api/export/profiles/{id}` | PATCH | `export:profile:manage` | Update profile |
|
||||
| `/api/export/profiles/{id}` | DELETE | `export:profile:manage` | Delete profile |
|
||||
|
||||
### 8.2 Export Runs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/export/runs` | POST | `export:run` | Start export |
|
||||
| `/api/export/runs/{id}` | GET | `export:read` | Get status |
|
||||
| `/api/export/runs/{id}/events` | SSE | `export:read` | Stream progress |
|
||||
| `/api/export/runs/{id}/cancel` | POST | `export:run` | Cancel export |
|
||||
|
||||
### 8.3 Downloads
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/export/runs/{id}/download` | GET | `export:download` | Download bundle |
|
||||
| `/api/export/runs/{id}/manifest` | GET | `export:read` | Get manifest |
|
||||
| `/api/export/runs/{id}/provenance` | GET | `export:read` | Get provenance |
|
||||
|
||||
---
|
||||
|
||||
## 9. Observability
|
||||
|
||||
### 9.1 Metrics
|
||||
|
||||
- `exporter_run_duration_seconds{profile,tenant}`
|
||||
- `exporter_run_bytes_total{profile}`
|
||||
- `exporter_run_failures_total{error_code}`
|
||||
- `exporter_active_runs{tenant}`
|
||||
- `exporter_distribution_push_seconds{type}`
|
||||
|
||||
### 9.2 Logs
|
||||
|
||||
Structured fields:
|
||||
- `run_id`, `tenant`, `profile_kind`, `adapter`
|
||||
- `phase` (plan, resolve, adapter, manifest, sign, distribute)
|
||||
- `correlation_id`, `error_code`
|
||||
|
||||
---
|
||||
|
||||
## 10. Security Considerations
|
||||
|
||||
### 10.1 Access Control
|
||||
|
||||
- Tenant claim enforced at every query
|
||||
- Cross-tenant selectors rejected (unless approved)
|
||||
- RBAC scopes: `export:profile:manage`, `export:run`, `export:read`, `export:download`
|
||||
|
||||
### 10.2 Encryption
|
||||
|
||||
- Optional encryption per profile
|
||||
- Keys derived from Authority-managed KMS
|
||||
- Mirror encryption uses tenant-specific recipients
|
||||
- Transport security (TLS) always required
|
||||
|
||||
### 10.3 Signing
|
||||
|
||||
- Cosign-compatible signatures
|
||||
- SLSA Level 2 attestations by default
|
||||
- Detached signatures stored alongside manifests
|
||||
|
||||
---
|
||||
|
||||
## 11. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Export Center architecture | `docs/modules/export-center/architecture.md` |
|
||||
| Profile definitions | `docs/modules/export-center/profiles.md` |
|
||||
| API reference | `docs/modules/export-center/api.md` |
|
||||
| DevPortal bundle spec | `docs/modules/export-center/devportal-offline.md` |
|
||||
|
||||
---
|
||||
|
||||
## 12. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0160_0001_0001_export_evidence.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0161_0001_0001_evidencelocker.md
|
||||
- SPRINT_0125_0001_0001_mirror.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `EXPORT-CORE-40-001` - Profile system (DONE)
|
||||
- `EXPORT-JSON-41-001` - JSON adapters (DONE)
|
||||
- `EXPORT-TRIVY-42-001` - Trivy adapters (DONE)
|
||||
- `EXPORT-OCI-45-001` - OCI distribution (IN PROGRESS)
|
||||
- `EXPORT-DEV-46-001` - DevPortal adapter (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 13. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Export reproducibility | 100% bit-identical |
|
||||
| Bundle generation time | < 5 min for 100k records |
|
||||
| Signature verification | 100% success rate |
|
||||
| Distribution availability | 99.9% uptime |
|
||||
| Retention compliance | 100% policy adherence |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,407 @@
|
||||
# Findings Ledger and Immutable Audit Trail
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, ledger semantics, and implementation strategy for the Findings Ledger module, covering append-only events, Merkle anchoring, projections, and deterministic exports.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Findings Ledger provides **immutable, auditable records** of all vulnerability findings and their state transitions. Key capabilities:
|
||||
|
||||
- **Append-Only Events** - Every finding change recorded permanently
|
||||
- **Merkle Anchoring** - Cryptographic proof of event ordering
|
||||
- **Projections** - Materialized current state views
|
||||
- **Deterministic Exports** - Reproducible compliance archives
|
||||
- **Chain Integrity** - Hash-linked event sequences per tenant
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Ledger Requirements | Use Case |
|
||||
|---------|---------------------|----------|
|
||||
| **Compliance** | Immutable audit trail | SOC 2, FedRAMP evidence |
|
||||
| **Security Teams** | Finding history | Investigation timelines |
|
||||
| **Legal/eDiscovery** | Tamper-proof records | Litigation support |
|
||||
| **Auditors** | Verifiable exports | Third-party attestation |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools provide mutable databases. Stella Ops differentiates with:
|
||||
- **Append-only architecture** ensuring no record deletion
|
||||
- **Merkle trees** for cryptographic verification
|
||||
- **Chain integrity** with hash-linked events
|
||||
- **Deterministic exports** for reproducible audits
|
||||
- **Air-gap support** with signed bundles
|
||||
|
||||
---
|
||||
|
||||
## 3. Event Model
|
||||
|
||||
### 3.1 Ledger Event Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "uuid",
|
||||
"type": "finding.status.changed",
|
||||
"tenant": "acme-corp",
|
||||
"chainId": "chain-uuid",
|
||||
"sequence": 12345,
|
||||
"policyVersion": "sha256:abc...",
|
||||
"finding": {
|
||||
"id": "artifact:sha256:...|pkg:npm/lodash",
|
||||
"artifactId": "sha256:...",
|
||||
"vulnId": "CVE-2025-12345"
|
||||
},
|
||||
"actor": {
|
||||
"id": "user:jane@acme.com",
|
||||
"type": "human"
|
||||
},
|
||||
"occurredAt": "2025-11-29T12:00:00Z",
|
||||
"recordedAt": "2025-11-29T12:00:01Z",
|
||||
"payload": {
|
||||
"previousStatus": "open",
|
||||
"newStatus": "triaged",
|
||||
"reason": "Under investigation"
|
||||
},
|
||||
"evidenceBundleRef": "bundle://tenant/2025/11/29/...",
|
||||
"eventHash": "sha256:...",
|
||||
"previousHash": "sha256:...",
|
||||
"merkleLeafHash": "sha256:..."
|
||||
}
|
||||
```
|
||||
|
||||
### 3.2 Event Types
|
||||
|
||||
| Type | Trigger | Payload |
|
||||
|------|---------|---------|
|
||||
| `finding.discovered` | New finding | severity, purl, advisory |
|
||||
| `finding.status.changed` | State transition | old/new status, reason |
|
||||
| `finding.verdict.changed` | Policy decision | verdict, rules matched |
|
||||
| `finding.vex.applied` | VEX override | status, justification |
|
||||
| `finding.assigned` | Owner change | assignee, team |
|
||||
| `finding.commented` | Annotation | comment text (redacted) |
|
||||
| `finding.resolved` | Resolution | resolution type, version |
|
||||
|
||||
### 3.3 Chain Semantics
|
||||
|
||||
- Each tenant has one or more event chains
|
||||
- Events are strictly ordered by sequence number
|
||||
- `previousHash` links to prior event for integrity
|
||||
- Chain forks are prohibited (409 on conflict)
|
||||
|
||||
---
|
||||
|
||||
## 4. Merkle Anchoring
|
||||
|
||||
### 4.1 Tree Structure
|
||||
|
||||
```
|
||||
Root Hash
|
||||
/ \
|
||||
Hash(A+B) Hash(C+D)
|
||||
/ \ / \
|
||||
H(E1) H(E2) H(E3) H(E4)
|
||||
| | | |
|
||||
Event1 Event2 Event3 Event4
|
||||
```
|
||||
|
||||
### 4.2 Anchoring Process
|
||||
|
||||
1. **Batch collection** - Events accumulate in windows (default 15 min)
|
||||
2. **Tree construction** - Leaves are event hashes
|
||||
3. **Root computation** - Merkle root represents batch
|
||||
4. **Anchor record** - Root stored with timestamp
|
||||
5. **Optional external** - Root can be published to external ledger
|
||||
|
||||
### 4.3 Configuration
|
||||
|
||||
```yaml
|
||||
findings:
|
||||
ledger:
|
||||
merkle:
|
||||
batchSize: 1000
|
||||
windowDuration: 00:15:00
|
||||
algorithm: sha256
|
||||
externalAnchor:
|
||||
enabled: false
|
||||
type: rekor # or custom
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Projections
|
||||
|
||||
### 5.1 Purpose
|
||||
|
||||
Projections provide **current state** views derived from event history. They are:
|
||||
- Materialized for fast queries
|
||||
- Reconstructible from events
|
||||
- Validated via `cycleHash`
|
||||
|
||||
### 5.2 Finding Projection
|
||||
|
||||
```json
|
||||
{
|
||||
"tenantId": "acme-corp",
|
||||
"findingId": "artifact:sha256:...|pkg:npm/lodash@4.17.20",
|
||||
"policyVersion": "sha256:5f38c...",
|
||||
"status": "triaged",
|
||||
"severity": 6.7,
|
||||
"riskScore": 85.2,
|
||||
"riskSeverity": "high",
|
||||
"riskProfileVersion": "v2.1",
|
||||
"labels": {
|
||||
"kev": true,
|
||||
"runtime": "exposed"
|
||||
},
|
||||
"currentEventId": "uuid",
|
||||
"cycleHash": "sha256:...",
|
||||
"policyRationale": [
|
||||
"explain://tenant/findings/...",
|
||||
"policy://tenant/policy-v1/rationale/accepted"
|
||||
],
|
||||
"updatedAt": "2025-11-29T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Projection Refresh
|
||||
|
||||
| Trigger | Action |
|
||||
|---------|--------|
|
||||
| New event | Incremental update |
|
||||
| Policy change | Full recalculation |
|
||||
| Manual request | On-demand rebuild |
|
||||
| Scheduled | Periodic validation |
|
||||
|
||||
---
|
||||
|
||||
## 6. Export Capabilities
|
||||
|
||||
### 6.1 Export Shapes
|
||||
|
||||
| Shape | Description | Use Case |
|
||||
|-------|-------------|----------|
|
||||
| `canonical` | Full event detail | Complete audit |
|
||||
| `compact` | Summary fields only | Quick reports |
|
||||
|
||||
### 6.2 Export Types
|
||||
|
||||
**Findings Export:**
|
||||
```json
|
||||
{
|
||||
"eventSequence": 12345,
|
||||
"observedAt": "2025-11-29T12:00:00Z",
|
||||
"findingId": "artifact:...|pkg:...",
|
||||
"policyVersion": "sha256:...",
|
||||
"status": "triaged",
|
||||
"severity": 6.7,
|
||||
"cycleHash": "sha256:...",
|
||||
"evidenceBundleRef": "bundle://...",
|
||||
"provenance": {
|
||||
"policyVersion": "sha256:...",
|
||||
"cycleHash": "sha256:...",
|
||||
"ledgerEventHash": "sha256:..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.3 Export Formats
|
||||
|
||||
- **JSON** - Paged API responses
|
||||
- **NDJSON** - Streaming exports
|
||||
- **Bundle** - Signed archive packages
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Strategy
|
||||
|
||||
### 7.1 Phase 1: Core Ledger (Complete)
|
||||
|
||||
- [x] Append-only event store
|
||||
- [x] Hash-linked chains
|
||||
- [x] Basic projection engine
|
||||
- [x] REST API surface
|
||||
|
||||
### 7.2 Phase 2: Merkle & Exports (In Progress)
|
||||
|
||||
- [x] Merkle tree construction
|
||||
- [x] Batch anchoring
|
||||
- [ ] External anchor integration (LEDGER-MERKLE-50-001)
|
||||
- [ ] Deterministic NDJSON exports (LEDGER-EXPORT-51-001)
|
||||
|
||||
### 7.3 Phase 3: Advanced Features (Planned)
|
||||
|
||||
- [ ] Chain integrity verification CLI
|
||||
- [ ] Projection replay tooling
|
||||
- [ ] Cross-tenant federation
|
||||
- [ ] Long-term archival
|
||||
|
||||
---
|
||||
|
||||
## 8. API Surface
|
||||
|
||||
### 8.1 Events
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/v1/ledger/events` | GET | `vuln:audit` | List ledger events |
|
||||
| `/v1/ledger/events` | POST | `vuln:operate` | Append event |
|
||||
|
||||
### 8.2 Projections
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/v1/ledger/projections/findings` | GET | `vuln:view` | List projections |
|
||||
|
||||
### 8.3 Exports
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/v1/ledger/export/findings` | GET | `vuln:audit` | Export findings |
|
||||
| `/v1/ledger/export/vex` | GET | `vuln:audit` | Export VEX |
|
||||
| `/v1/ledger/export/advisories` | GET | `vuln:audit` | Export advisories |
|
||||
| `/v1/ledger/export/sboms` | GET | `vuln:audit` | Export SBOMs |
|
||||
|
||||
### 8.4 Attestations
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/v1/ledger/attestations` | GET | `vuln:audit` | List verifications |
|
||||
|
||||
---
|
||||
|
||||
## 9. Storage Model
|
||||
|
||||
### 9.1 Collections
|
||||
|
||||
| Collection | Purpose | Key Indexes |
|
||||
|------------|---------|-------------|
|
||||
| `ledger_events` | Append-only events | `{tenant, chainId, sequence}` |
|
||||
| `ledger_chains` | Chain metadata | `{tenant, chainId}` |
|
||||
| `ledger_merkle_roots` | Anchor records | `{tenant, batchId, anchoredAt}` |
|
||||
| `finding_projections` | Current state | `{tenant, findingId}` |
|
||||
|
||||
### 9.2 Integrity Constraints
|
||||
|
||||
- Events are append-only (no update/delete)
|
||||
- Sequence numbers strictly monotonic
|
||||
- Hash chain validated on write
|
||||
- Merkle roots immutable
|
||||
|
||||
---
|
||||
|
||||
## 10. Observability
|
||||
|
||||
### 10.1 Metrics
|
||||
|
||||
- `ledger.events.appended_total{tenant,type}`
|
||||
- `ledger.events.rejected_total{reason}`
|
||||
- `ledger.merkle.batches_total`
|
||||
- `ledger.merkle.anchor_latency_seconds`
|
||||
- `ledger.projection.updates_total`
|
||||
- `ledger.projection.staleness_seconds`
|
||||
- `ledger.export.rows_total{type,shape}`
|
||||
|
||||
### 10.2 SLO Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Event append latency | < 50ms p95 |
|
||||
| Projection freshness | < 5 seconds |
|
||||
| Merkle anchor window | 15 minutes |
|
||||
| Export throughput | 10k rows/sec |
|
||||
|
||||
---
|
||||
|
||||
## 11. Security Considerations
|
||||
|
||||
### 11.1 Immutability Guarantees
|
||||
|
||||
- No UPDATE/DELETE operations exposed
|
||||
- Admin override requires audit event
|
||||
- Merkle roots provide tamper evidence
|
||||
- External anchoring for non-repudiation
|
||||
|
||||
### 11.2 Access Control
|
||||
|
||||
- `vuln:view` - Read projections
|
||||
- `vuln:investigate` - Triage actions
|
||||
- `vuln:operate` - State transitions
|
||||
- `vuln:audit` - Export and verify
|
||||
|
||||
### 11.3 Data Protection
|
||||
|
||||
- Sensitive payloads redacted in exports
|
||||
- Comment text hashed, not stored
|
||||
- PII filtered at ingest
|
||||
- Tenant isolation enforced
|
||||
|
||||
---
|
||||
|
||||
## 12. Air-Gap Support
|
||||
|
||||
### 12.1 Offline Bundles
|
||||
|
||||
- Signed NDJSON exports
|
||||
- Merkle proofs included
|
||||
- Time anchors from trusted source
|
||||
- Bundle verification CLI
|
||||
|
||||
### 12.2 Staleness Tracking
|
||||
|
||||
```yaml
|
||||
airgap:
|
||||
staleness:
|
||||
warningThresholdDays: 7
|
||||
blockThresholdDays: 30
|
||||
riskCriticalExportsBlocked: true
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Ledger schema | `docs/modules/findings-ledger/schema.md` |
|
||||
| OpenAPI spec | `docs/modules/findings-ledger/openapi/` |
|
||||
| Export guide | `docs/modules/findings-ledger/exports.md` |
|
||||
|
||||
---
|
||||
|
||||
## 14. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0186_0001_0001_record_deterministic_execution.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0120_0000_0001_policy_reasoning.md
|
||||
- SPRINT_311_docs_tasks_md_xi.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `LEDGER-CORE-40-001` - Event store (DONE)
|
||||
- `LEDGER-PROJ-41-001` - Projections (DONE)
|
||||
- `LEDGER-MERKLE-50-001` - Merkle anchoring (IN PROGRESS)
|
||||
- `LEDGER-EXPORT-51-001` - Deterministic exports (IN PROGRESS)
|
||||
- `LEDGER-AIRGAP-56-001` - Bundle provenance (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 15. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Event durability | 100% (no data loss) |
|
||||
| Chain integrity | 100% hash verification |
|
||||
| Projection accuracy | 100% event replay match |
|
||||
| Export determinism | 100% hash reproducibility |
|
||||
| Audit compliance | SOC 2 Type II |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,331 @@
|
||||
# Graph Analytics and Dependency Insights
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, graph model, and implementation strategy for the Graph module, covering dependency analysis, impact visualization, and offline exports.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Graph module provides **dependency analysis and impact visualization** across the vulnerability landscape. Key capabilities:
|
||||
|
||||
- **Unified Graph Model** - Artifacts, components, advisories, policies linked
|
||||
- **Impact Analysis** - Blast radius, affected paths, transitive dependencies
|
||||
- **Policy Overlays** - VEX and policy decisions visualized on graph
|
||||
- **Analytics** - Clustering, centrality, community detection
|
||||
- **Offline Export** - Deterministic graph snapshots for air-gap
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Graph Requirements | Use Case |
|
||||
|---------|-------------------|----------|
|
||||
| **Security Teams** | Impact analysis | Vulnerability prioritization |
|
||||
| **Developers** | Dependency visualization | Upgrade planning |
|
||||
| **Compliance** | Audit trails | Relationship documentation |
|
||||
| **Management** | Risk dashboards | Portfolio risk view |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools show flat lists. Stella Ops differentiates with:
|
||||
- **Graph-native architecture** linking all entities
|
||||
- **Impact visualization** showing blast radius
|
||||
- **Policy overlays** embedding decisions in graph
|
||||
- **Offline-compatible** exports for air-gap analysis
|
||||
- **Analytics** for community detection and centrality
|
||||
|
||||
---
|
||||
|
||||
## 3. Graph Model
|
||||
|
||||
### 3.1 Node Types
|
||||
|
||||
| Node | Description | Key Properties |
|
||||
|------|-------------|----------------|
|
||||
| **Artifact** | Image/application digest | tenant, environment, labels |
|
||||
| **Component** | Package version | purl, ecosystem, version |
|
||||
| **File** | Source/binary path | hash, mtime |
|
||||
| **License** | License identifier | spdx-id, restrictions |
|
||||
| **Advisory** | Vulnerability record | cve-id, severity, sources |
|
||||
| **VEXStatement** | VEX decision | status, justification |
|
||||
| **PolicyVersion** | Signed policy pack | version, digest |
|
||||
|
||||
### 3.2 Edge Types
|
||||
|
||||
| Edge | From | To | Properties |
|
||||
|------|------|-----|------------|
|
||||
| `DEPENDS_ON` | Component | Component | scope, optional |
|
||||
| `BUILT_FROM` | Artifact | Component | layer, path |
|
||||
| `DECLARED_IN` | Component | File | sbom-id |
|
||||
| `AFFECTED_BY` | Component | Advisory | version-range |
|
||||
| `VEX_EXEMPTS` | VEXStatement | Advisory | justification |
|
||||
| `GOVERNS_WITH` | PolicyVersion | Artifact | run-id |
|
||||
| `OBSERVED_RUNTIME` | Artifact | Component | zastava-event-id |
|
||||
|
||||
### 3.3 Provenance
|
||||
|
||||
Every edge carries:
|
||||
- `createdAt` - UTC timestamp
|
||||
- `sourceDigest` - SRM/SBOM hash
|
||||
- `provenanceRef` - Link to source document
|
||||
|
||||
---
|
||||
|
||||
## 4. Overlay System
|
||||
|
||||
### 4.1 Overlay Types
|
||||
|
||||
| Overlay | Purpose | Content |
|
||||
|---------|---------|---------|
|
||||
| `policy.overlay.v1` | Policy decisions | verdict, severity, rules |
|
||||
| `openvex.v1` | VEX status | status, justification |
|
||||
| `reachability.v1` | Runtime reachability | state, confidence |
|
||||
| `clustering.v1` | Community detection | cluster-id, modularity |
|
||||
| `centrality.v1` | Node importance | degree, betweenness |
|
||||
|
||||
### 4.2 Overlay Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"overlayId": "sha256(tenant|nodeId|overlayKind)",
|
||||
"overlayKind": "policy.overlay.v1",
|
||||
"nodeId": "component:pkg:npm/lodash@4.17.21",
|
||||
"tenant": "acme-corp",
|
||||
"generatedAt": "2025-11-29T12:00:00Z",
|
||||
"content": {
|
||||
"verdict": "blocked",
|
||||
"severity": "critical",
|
||||
"rulesMatched": ["rule-001", "rule-002"],
|
||||
"explainTrace": "sampled trace data..."
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Query Capabilities
|
||||
|
||||
### 5.1 Search API
|
||||
|
||||
```bash
|
||||
POST /graph/search
|
||||
{
|
||||
"tenant": "acme-corp",
|
||||
"query": "severity:critical AND ecosystem:npm",
|
||||
"nodeTypes": ["Component", "Advisory"],
|
||||
"limit": 100
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Path Query
|
||||
|
||||
```bash
|
||||
POST /graph/paths
|
||||
{
|
||||
"source": "artifact:sha256:abc123...",
|
||||
"target": "advisory:CVE-2025-12345",
|
||||
"maxDepth": 6,
|
||||
"includeOverlays": true
|
||||
}
|
||||
```
|
||||
|
||||
**Response:**
|
||||
```json
|
||||
{
|
||||
"paths": [
|
||||
{
|
||||
"nodes": ["artifact:sha256:...", "component:pkg:npm/...", "advisory:CVE-..."],
|
||||
"edges": [{"type": "BUILT_FROM"}, {"type": "AFFECTED_BY"}],
|
||||
"length": 2
|
||||
}
|
||||
],
|
||||
"overlays": [
|
||||
{"nodeId": "component:...", "overlayKind": "policy.overlay.v1", "content": {...}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 5.3 Diff Query
|
||||
|
||||
```bash
|
||||
POST /graph/diff
|
||||
{
|
||||
"snapshotA": "snapshot-2025-11-28",
|
||||
"snapshotB": "snapshot-2025-11-29",
|
||||
"includeOverlays": true
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Analytics Pipeline
|
||||
|
||||
### 6.1 Clustering
|
||||
|
||||
- **Algorithm:** Louvain community detection
|
||||
- **Output:** Cluster IDs per node, modularity score
|
||||
- **Use Case:** Identify tightly coupled component groups
|
||||
|
||||
### 6.2 Centrality
|
||||
|
||||
- **Degree centrality:** Most connected nodes
|
||||
- **Betweenness centrality:** Critical path nodes
|
||||
- **Use Case:** Identify high-impact components
|
||||
|
||||
### 6.3 Background Processing
|
||||
|
||||
```yaml
|
||||
analytics:
|
||||
enabled: true
|
||||
schedule: "0 */6 * * *" # Every 6 hours
|
||||
algorithms:
|
||||
- clustering
|
||||
- centrality
|
||||
snapshotRetention: 30
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Strategy
|
||||
|
||||
### 7.1 Phase 1: Core Model (Complete)
|
||||
|
||||
- [x] Node/edge schema
|
||||
- [x] SBOM ingestion pipeline
|
||||
- [x] Advisory/VEX linking
|
||||
- [x] Basic search API
|
||||
|
||||
### 7.2 Phase 2: Overlays (In Progress)
|
||||
|
||||
- [x] Policy overlay generation
|
||||
- [x] VEX overlay generation
|
||||
- [ ] Reachability overlay (GRAPH-REACH-50-001)
|
||||
- [ ] Inline overlay in query responses (GRAPH-QUERY-51-001)
|
||||
|
||||
### 7.3 Phase 3: Analytics (Planned)
|
||||
|
||||
- [ ] Clustering algorithm
|
||||
- [ ] Centrality calculations
|
||||
- [ ] Background worker
|
||||
- [ ] Analytics overlays export
|
||||
|
||||
### 7.4 Phase 4: Visualization (Planned)
|
||||
|
||||
- [ ] Console graph viewer
|
||||
- [ ] Impact tree visualization
|
||||
- [ ] Diff visualization
|
||||
|
||||
---
|
||||
|
||||
## 8. API Surface
|
||||
|
||||
### 8.1 Core APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/graph/search` | POST | `graph:read` | Search nodes |
|
||||
| `/graph/query` | POST | `graph:read` | Complex queries |
|
||||
| `/graph/paths` | POST | `graph:read` | Path finding |
|
||||
| `/graph/diff` | POST | `graph:read` | Snapshot diff |
|
||||
| `/graph/nodes/{id}` | GET | `graph:read` | Node detail |
|
||||
|
||||
### 8.2 Export APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/graph/export` | POST | `graph:export` | Start export job |
|
||||
| `/graph/export/{jobId}` | GET | `graph:read` | Job status |
|
||||
| `/graph/export/{jobId}/download` | GET | `graph:export` | Download bundle |
|
||||
|
||||
---
|
||||
|
||||
## 9. Storage Model
|
||||
|
||||
### 9.1 Collections
|
||||
|
||||
| Collection | Purpose | Key Indexes |
|
||||
|------------|---------|-------------|
|
||||
| `graph_nodes` | Node records | `{tenant, nodeType, nodeId}` |
|
||||
| `graph_edges` | Edge records | `{tenant, fromId, toId, edgeType}` |
|
||||
| `graph_overlays` | Overlay data | `{tenant, nodeId, overlayKind}` |
|
||||
| `graph_snapshots` | Point-in-time snapshots | `{tenant, snapshotId}` |
|
||||
|
||||
### 9.2 Export Format
|
||||
|
||||
```
|
||||
graph-export/
|
||||
├── nodes.jsonl # Sorted by nodeId
|
||||
├── edges.jsonl # Sorted by (from, to, type)
|
||||
├── overlays/
|
||||
│ ├── policy.jsonl
|
||||
│ ├── openvex.jsonl
|
||||
│ └── manifest.json
|
||||
└── manifest.json
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Observability
|
||||
|
||||
### 10.1 Metrics
|
||||
|
||||
- `graph_ingest_lag_seconds`
|
||||
- `graph_nodes_total{nodeType}`
|
||||
- `graph_edges_total{edgeType}`
|
||||
- `graph_query_latency_seconds{queryType}`
|
||||
- `graph_analytics_runs_total`
|
||||
- `graph_analytics_clusters_total`
|
||||
|
||||
### 10.2 Offline Support
|
||||
|
||||
- Graph snapshots packaged for Offline Kit
|
||||
- Deterministic NDJSON exports
|
||||
- Overlay manifests with digests
|
||||
|
||||
---
|
||||
|
||||
## 11. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Graph architecture | `docs/modules/graph/architecture.md` |
|
||||
| Query language | `docs/modules/graph/query-language.md` |
|
||||
| Overlay specification | `docs/modules/graph/overlays.md` |
|
||||
|
||||
---
|
||||
|
||||
## 12. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0141_0001_0001_graph_indexer.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0401_0001_0001_reachability_evidence_chain.md
|
||||
- SPRINT_0140_0001_0001_runtime_signals.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `GRAPH-CORE-40-001` - Core model (DONE)
|
||||
- `GRAPH-INGEST-41-001` - SBOM ingestion (DONE)
|
||||
- `GRAPH-REACH-50-001` - Reachability overlay (IN PROGRESS)
|
||||
- `GRAPH-ANALYTICS-55-001` - Clustering (TODO)
|
||||
- `GRAPH-VIZ-60-001` - Visualization (FUTURE)
|
||||
|
||||
---
|
||||
|
||||
## 13. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Query latency | < 500ms p95 |
|
||||
| Ingestion lag | < 5 minutes |
|
||||
| Path query depth | Up to 6 hops |
|
||||
| Export reproducibility | 100% deterministic |
|
||||
| Analytics freshness | < 6 hours |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,469 @@
|
||||
# Notification Rules and Alerting Engine
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, rules engine semantics, and implementation strategy for the Notify module, covering channel connectors, throttling, digests, and delivery management.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Notify module provides **rules-driven, tenant-aware notification delivery** across security workflows. Key capabilities:
|
||||
|
||||
- **Rules Engine** - Declarative matchers for event routing
|
||||
- **Multi-Channel Delivery** - Slack, Teams, Email, Webhooks
|
||||
- **Noise Control** - Throttling, deduplication, digest windows
|
||||
- **Approval Tokens** - DSSE-signed ack tokens for one-click workflows
|
||||
- **Audit Trail** - Complete delivery history with redacted payloads
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Notification Requirements | Use Case |
|
||||
|---------|--------------------------|----------|
|
||||
| **Security Teams** | Real-time critical alerts | Incident response |
|
||||
| **DevSecOps** | CI/CD integration | Pipeline notifications |
|
||||
| **Compliance** | Audit trails | Delivery verification |
|
||||
| **Management** | Digest summaries | Executive reporting |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools offer basic email alerts. Stella Ops differentiates with:
|
||||
- **Rules-based routing** with fine-grained matchers
|
||||
- **Native Slack/Teams integration** with rich formatting
|
||||
- **Digest windows** to prevent alert fatigue
|
||||
- **Cryptographic ack tokens** for approval workflows
|
||||
- **Tenant isolation** with quota controls
|
||||
|
||||
---
|
||||
|
||||
## 3. Rules Engine
|
||||
|
||||
### 3.1 Rule Structure
|
||||
|
||||
```yaml
|
||||
name: "critical-alerts-prod"
|
||||
enabled: true
|
||||
tenant: "acme-corp"
|
||||
|
||||
match:
|
||||
eventKinds:
|
||||
- "scanner.report.ready"
|
||||
- "scheduler.rescan.delta"
|
||||
- "zastava.admission"
|
||||
namespaces: ["prod-*"]
|
||||
repos: ["ghcr.io/acme/*"]
|
||||
minSeverity: "high"
|
||||
kev: true
|
||||
verdict: ["fail", "deny"]
|
||||
vex:
|
||||
includeRejectedJustifications: false
|
||||
|
||||
actions:
|
||||
- channel: "slack:sec-alerts"
|
||||
template: "concise"
|
||||
throttle: "5m"
|
||||
|
||||
- channel: "email:soc"
|
||||
digest: "hourly"
|
||||
template: "detailed"
|
||||
```
|
||||
|
||||
### 3.2 Matcher Types
|
||||
|
||||
| Matcher | Description | Example |
|
||||
|---------|-------------|---------|
|
||||
| `eventKinds` | Event type filter | `["scanner.report.ready"]` |
|
||||
| `namespaces` | Namespace patterns | `["prod-*", "staging"]` |
|
||||
| `repos` | Repository patterns | `["ghcr.io/acme/*"]` |
|
||||
| `minSeverity` | Minimum severity | `"high"` |
|
||||
| `kev` | KEV-tagged required | `true` |
|
||||
| `verdict` | Report/admission verdict | `["fail", "deny"]` |
|
||||
| `labels` | Kubernetes labels | `{"env": "production"}` |
|
||||
|
||||
### 3.3 Evaluation Order
|
||||
|
||||
1. **Tenant check** - Discard if rule tenant ≠ event tenant
|
||||
2. **Kind filter** - Early discard for non-matching kinds
|
||||
3. **Scope match** - Namespace/repo/label matching
|
||||
4. **Delta gates** - Severity threshold evaluation
|
||||
5. **VEX gate** - Filter based on VEX status
|
||||
6. **Throttle/dedup** - Idempotency key check
|
||||
7. **Actions** - Enqueue per-channel jobs
|
||||
|
||||
---
|
||||
|
||||
## 4. Channel Connectors
|
||||
|
||||
### 4.1 Built-in Channels
|
||||
|
||||
| Channel | Features | Rate Limits |
|
||||
|---------|----------|-------------|
|
||||
| **Slack** | Blocks, threads, reactions | 1 msg/sec per channel |
|
||||
| **Teams** | Adaptive Cards, webhooks | 4 msgs/sec |
|
||||
| **Email** | HTML+text, attachments | Relay-dependent |
|
||||
| **Webhook** | JSON, HMAC signing | 10 req/sec |
|
||||
|
||||
### 4.2 Channel Configuration
|
||||
|
||||
```yaml
|
||||
channels:
|
||||
- name: "slack:sec-alerts"
|
||||
type: slack
|
||||
config:
|
||||
channel: "#security-alerts"
|
||||
workspace: "acme-corp"
|
||||
secretRef: "ref://notify/slack-token"
|
||||
|
||||
- name: "email:soc"
|
||||
type: email
|
||||
config:
|
||||
to: ["soc@acme.com"]
|
||||
from: "stellaops@acme.com"
|
||||
smtpHost: "smtp.acme.com"
|
||||
secretRef: "ref://notify/smtp-creds"
|
||||
|
||||
- name: "webhook:siem"
|
||||
type: webhook
|
||||
config:
|
||||
url: "https://siem.acme.com/api/events"
|
||||
signMethod: "ed25519"
|
||||
signKeyRef: "ref://notify/webhook-key"
|
||||
```
|
||||
|
||||
### 4.3 Connector Contract
|
||||
|
||||
```csharp
|
||||
public interface INotifyConnector
|
||||
{
|
||||
string Type { get; }
|
||||
Task<DeliveryResult> SendAsync(DeliveryContext ctx, CancellationToken ct);
|
||||
Task<HealthResult> HealthAsync(ChannelConfig cfg, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Noise Control
|
||||
|
||||
### 5.1 Throttling
|
||||
|
||||
- **Per-action throttle** - Suppress duplicates within window
|
||||
- **Idempotency key** - `hash(ruleId | actionId | event.kind | scope.digest | day)`
|
||||
- **Configurable windows** - 5m, 15m, 1h, 1d
|
||||
|
||||
### 5.2 Digest Windows
|
||||
|
||||
```yaml
|
||||
actions:
|
||||
- channel: "email:weekly-summary"
|
||||
digest: "weekly"
|
||||
digestOptions:
|
||||
maxItems: 100
|
||||
groupBy: ["severity", "namespace"]
|
||||
template: "digest-summary"
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Coalesce events within window
|
||||
- Summarize top N items with counts
|
||||
- Flush on window close or max items
|
||||
- Safe truncation with "and X more" links
|
||||
|
||||
### 5.3 Quiet Hours
|
||||
|
||||
```yaml
|
||||
notify:
|
||||
quietHours:
|
||||
enabled: true
|
||||
window: "22:00-06:00"
|
||||
timezone: "America/New_York"
|
||||
minSeverity: "critical"
|
||||
```
|
||||
|
||||
Only critical alerts during quiet hours; others deferred to digests.
|
||||
|
||||
---
|
||||
|
||||
## 6. Templates & Rendering
|
||||
|
||||
### 6.1 Template Engine
|
||||
|
||||
- Handlebars-style safe templates
|
||||
- No arbitrary code execution
|
||||
- Deterministic outputs (stable property order)
|
||||
- Locale-aware formatting
|
||||
|
||||
### 6.2 Template Variables
|
||||
|
||||
| Variable | Description |
|
||||
|----------|-------------|
|
||||
| `event.kind` | Event type |
|
||||
| `event.ts` | Timestamp |
|
||||
| `scope.namespace` | Kubernetes namespace |
|
||||
| `scope.repo` | Repository |
|
||||
| `scope.digest` | Image digest |
|
||||
| `payload.verdict` | Policy verdict |
|
||||
| `payload.delta.newCritical` | New critical count |
|
||||
| `payload.links.ui` | UI deep link |
|
||||
| `topFindings[]` | Top N findings |
|
||||
|
||||
### 6.3 Channel-Specific Rendering
|
||||
|
||||
**Slack:**
|
||||
```json
|
||||
{
|
||||
"blocks": [
|
||||
{"type": "header", "text": {"type": "plain_text", "text": "Policy FAIL: nginx:latest"}},
|
||||
{"type": "section", "text": {"type": "mrkdwn", "text": "*2 critical*, 3 high vulnerabilities"}}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
**Email:**
|
||||
```html
|
||||
<h2>Policy FAIL: nginx:latest</h2>
|
||||
<table>
|
||||
<tr><td>Critical</td><td>2</td></tr>
|
||||
<tr><td>High</td><td>3</td></tr>
|
||||
</table>
|
||||
<a href="https://ui.internal/reports/...">View Details</a>
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 7. Ack Tokens
|
||||
|
||||
### 7.1 Token Structure
|
||||
|
||||
DSSE-signed tokens for one-click acknowledgements:
|
||||
|
||||
```json
|
||||
{
|
||||
"payloadType": "application/vnd.stellaops.notify-ack-token+json",
|
||||
"payload": {
|
||||
"tenant": "acme-corp",
|
||||
"deliveryId": "delivery-123",
|
||||
"notificationId": "notif-456",
|
||||
"channel": "slack:sec-alerts",
|
||||
"webhookUrl": "https://notify.internal/ack",
|
||||
"nonce": "random-nonce",
|
||||
"actions": ["acknowledge", "escalate"],
|
||||
"expiresAt": "2025-11-29T13:00:00Z"
|
||||
},
|
||||
"signatures": [{"keyid": "notify-ack-key-01", "sig": "..."}]
|
||||
}
|
||||
```
|
||||
|
||||
### 7.2 Token Workflow
|
||||
|
||||
1. **Issue** - `POST /notify/ack-tokens/issue`
|
||||
2. **Embed** - Token included in message action button
|
||||
3. **Click** - User clicks button, token sent to webhook
|
||||
4. **Verify** - `POST /notify/ack-tokens/verify`
|
||||
5. **Audit** - Ack event recorded
|
||||
|
||||
### 7.3 Token Rotation
|
||||
|
||||
```bash
|
||||
# Rotate ack token signing key
|
||||
stella notify rotate-ack-key --key-source kms://notify/ack-key
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Implementation Strategy
|
||||
|
||||
### 8.1 Phase 1: Core Engine (Complete)
|
||||
|
||||
- [x] Rules engine with matchers
|
||||
- [x] Slack connector
|
||||
- [x] Teams connector
|
||||
- [x] Email connector
|
||||
- [x] Webhook connector
|
||||
|
||||
### 8.2 Phase 2: Noise Control (Complete)
|
||||
|
||||
- [x] Throttling
|
||||
- [x] Digest windows
|
||||
- [x] Idempotency
|
||||
- [x] Quiet hours
|
||||
|
||||
### 8.3 Phase 3: Ack Tokens (In Progress)
|
||||
|
||||
- [x] Token issuance
|
||||
- [x] Token verification
|
||||
- [ ] Token rotation API (NOTIFY-ACK-45-001)
|
||||
- [ ] Escalation workflows (NOTIFY-ESC-46-001)
|
||||
|
||||
### 8.4 Phase 4: Advanced Features (Planned)
|
||||
|
||||
- [ ] PagerDuty connector
|
||||
- [ ] Jira ticket creation
|
||||
- [ ] In-app notifications
|
||||
- [ ] Anomaly suppression
|
||||
|
||||
---
|
||||
|
||||
## 9. API Surface
|
||||
|
||||
### 9.1 Channels
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/notify/channels` | GET/POST | `notify.read/admin` | List/create channels |
|
||||
| `/api/v1/notify/channels/{id}` | GET/PATCH/DELETE | `notify.admin` | Manage channel |
|
||||
| `/api/v1/notify/channels/{id}/test` | POST | `notify.admin` | Send test message |
|
||||
| `/api/v1/notify/channels/{id}/health` | GET | `notify.read` | Health check |
|
||||
|
||||
### 9.2 Rules
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/notify/rules` | GET/POST | `notify.read/admin` | List/create rules |
|
||||
| `/api/v1/notify/rules/{id}` | GET/PATCH/DELETE | `notify.admin` | Manage rule |
|
||||
| `/api/v1/notify/rules/{id}/test` | POST | `notify.admin` | Dry-run rule |
|
||||
|
||||
### 9.3 Deliveries
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/v1/notify/deliveries` | GET | `notify.read` | List deliveries |
|
||||
| `/api/v1/notify/deliveries/{id}` | GET | `notify.read` | Delivery detail |
|
||||
| `/api/v1/notify/deliveries/{id}/retry` | POST | `notify.admin` | Retry delivery |
|
||||
|
||||
---
|
||||
|
||||
## 10. Event Sources
|
||||
|
||||
### 10.1 Subscribed Events
|
||||
|
||||
| Event | Source | Typical Actions |
|
||||
|-------|--------|-----------------|
|
||||
| `scanner.scan.completed` | Scanner | Immediate/digest |
|
||||
| `scanner.report.ready` | Scanner | Immediate |
|
||||
| `scheduler.rescan.delta` | Scheduler | Immediate/digest |
|
||||
| `attestor.logged` | Attestor | Immediate |
|
||||
| `zastava.admission` | Zastava | Immediate |
|
||||
| `conselier.export.completed` | Concelier | Digest |
|
||||
| `excitor.export.completed` | Excititor | Digest |
|
||||
|
||||
### 10.2 Event Envelope
|
||||
|
||||
```json
|
||||
{
|
||||
"eventId": "uuid",
|
||||
"kind": "scanner.report.ready",
|
||||
"tenant": "acme-corp",
|
||||
"ts": "2025-11-29T12:00:00Z",
|
||||
"actor": "scanner-webservice",
|
||||
"scope": {
|
||||
"namespace": "production",
|
||||
"repo": "ghcr.io/acme/api",
|
||||
"digest": "sha256:..."
|
||||
},
|
||||
"payload": {
|
||||
"reportId": "report-123",
|
||||
"verdict": "fail",
|
||||
"summary": {"total": 12, "blocked": 2},
|
||||
"delta": {"newCritical": 1, "kev": ["CVE-2025-..."]}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Observability
|
||||
|
||||
### 11.1 Metrics
|
||||
|
||||
- `notify.events_consumed_total{kind}`
|
||||
- `notify.rules_matched_total{ruleId}`
|
||||
- `notify.throttled_total{reason}`
|
||||
- `notify.digest_coalesced_total{window}`
|
||||
- `notify.sent_total{channel}`
|
||||
- `notify.failed_total{channel,code}`
|
||||
- `notify.delivery_latency_seconds{channel}`
|
||||
|
||||
### 11.2 SLO Targets
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Event-to-delivery p95 | < 60 seconds |
|
||||
| Failure rate | < 0.5% per hour |
|
||||
| Duplicate rate | ~0% |
|
||||
|
||||
---
|
||||
|
||||
## 12. Security Considerations
|
||||
|
||||
### 12.1 Secret Management
|
||||
|
||||
- Secrets stored as references only
|
||||
- Just-in-time fetch at send time
|
||||
- No plaintext in Mongo
|
||||
|
||||
### 12.2 Webhook Signing
|
||||
|
||||
```
|
||||
X-StellaOps-Signature: t=1732881600,v1=abc123...
|
||||
X-StellaOps-Timestamp: 2025-11-29T12:00:00Z
|
||||
```
|
||||
|
||||
- HMAC-SHA256 or Ed25519
|
||||
- Replay window protection
|
||||
- Canonical body hash
|
||||
|
||||
### 12.3 Loop Prevention
|
||||
|
||||
- Webhook target allowlist
|
||||
- Event origin tags
|
||||
- Own webhooks rejected
|
||||
|
||||
---
|
||||
|
||||
## 13. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Notify architecture | `docs/modules/notify/architecture.md` |
|
||||
| Channel schemas | `docs/modules/notify/resources/schemas/` |
|
||||
| Sample payloads | `docs/modules/notify/resources/samples/` |
|
||||
| Bootstrap pack | `docs/modules/notify/bootstrap-pack.md` |
|
||||
|
||||
---
|
||||
|
||||
## 14. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0170_0001_0001_notify_engine.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0171_0001_0002_notify_connectors.md
|
||||
- SPRINT_0172_0001_0003_notify_ack_tokens.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `NOTIFY-ENGINE-40-001` - Rules engine (DONE)
|
||||
- `NOTIFY-CONN-41-001` - Connectors (DONE)
|
||||
- `NOTIFY-NOISE-42-001` - Throttling/digests (DONE)
|
||||
- `NOTIFY-ACK-45-001` - Token rotation (IN PROGRESS)
|
||||
- `NOTIFY-ESC-46-001` - Escalation workflows (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 15. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Delivery latency | < 60s p95 |
|
||||
| Delivery success rate | > 99.5% |
|
||||
| Duplicate rate | < 0.01% |
|
||||
| Rule evaluation time | < 10ms |
|
||||
| Channel health | 99.9% uptime |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,432 @@
|
||||
# Orchestrator Event Model and Job Lifecycle
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, job lifecycle semantics, and implementation strategy for the Orchestrator module, covering event models, quota governance, replay semantics, and TaskRunner bridge.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Orchestrator is the **central job coordination layer** for all Stella Ops asynchronous operations. Key capabilities:
|
||||
|
||||
- **Unified Job Lifecycle** - Enqueue, schedule, lease, complete with audit trail
|
||||
- **Quota Governance** - Per-tenant rate limits, burst controls, circuit breakers
|
||||
- **Replay Semantics** - Deterministic job replay for audit and recovery
|
||||
- **TaskRunner Bridge** - Pack-run integration with heartbeats and artifacts
|
||||
- **Event Fan-Out** - SSE/GraphQL feeds for dashboards and notifications
|
||||
- **Offline Export** - Audit bundles for compliance and investigations
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Orchestration Requirements | Use Case |
|
||||
|---------|---------------------------|----------|
|
||||
| **Enterprise** | Rate limiting, quota management | Multi-team resource sharing |
|
||||
| **MSP/MSSP** | Multi-tenant isolation | Managed security services |
|
||||
| **Compliance Teams** | Audit trails, replay | SOC 2, FedRAMP evidence |
|
||||
| **DevSecOps** | CI/CD integration, webhooks | Pipeline automation |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability platforms lack sophisticated job orchestration. Stella Ops differentiates with:
|
||||
- **Deterministic replay** for audit and debugging
|
||||
- **Fine-grained quotas** per tenant/job-type
|
||||
- **Circuit breakers** for automatic failure isolation
|
||||
- **Native pack-run integration** for workflow automation
|
||||
- **Offline-compatible** audit bundles
|
||||
|
||||
---
|
||||
|
||||
## 3. Job Lifecycle Model
|
||||
|
||||
### 3.1 State Machine
|
||||
|
||||
```
|
||||
[Created] --> [Queued] --> [Leased] --> [Running] --> [Completed]
|
||||
| | | |
|
||||
| | v v
|
||||
| +-------> [Failed] <----[Canceled]
|
||||
| |
|
||||
v v
|
||||
[Throttled] [Incident]
|
||||
```
|
||||
|
||||
### 3.2 Lifecycle Phases
|
||||
|
||||
| Phase | Description | Transitions |
|
||||
|-------|-------------|-------------|
|
||||
| **Created** | Job request received | -> Queued |
|
||||
| **Queued** | Awaiting scheduling | -> Leased, Throttled |
|
||||
| **Throttled** | Rate limit applied | -> Queued (after delay) |
|
||||
| **Leased** | Worker acquired job | -> Running, Expired |
|
||||
| **Running** | Active execution | -> Completed, Failed, Canceled |
|
||||
| **Completed** | Success, archived | Terminal |
|
||||
| **Failed** | Error, may retry | -> Queued (retry), Incident |
|
||||
| **Canceled** | Operator abort | Terminal |
|
||||
| **Incident** | Escalated failure | Terminal (requires operator) |
|
||||
|
||||
### 3.3 Job Request Structure
|
||||
|
||||
```json
|
||||
{
|
||||
"jobId": "uuid",
|
||||
"jobType": "scan|policy-run|export|pack-run|advisory-sync",
|
||||
"tenant": "tenant-id",
|
||||
"priority": "low|normal|high|emergency",
|
||||
"payloadDigest": "sha256:...",
|
||||
"payload": { "imageRef": "nginx:latest", "options": {} },
|
||||
"dependencies": ["job-id-1", "job-id-2"],
|
||||
"idempotencyKey": "unique-request-key",
|
||||
"correlationId": "trace-id",
|
||||
"requestedBy": "user-id|service-id",
|
||||
"requestedAt": "2025-11-29T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Quota Governance
|
||||
|
||||
### 4.1 Quota Model
|
||||
|
||||
```yaml
|
||||
quotas:
|
||||
- tenant: "acme-corp"
|
||||
jobType: "*"
|
||||
maxActive: 50
|
||||
maxPerHour: 500
|
||||
burst: 10
|
||||
priority:
|
||||
emergency:
|
||||
maxActive: 5
|
||||
skipQueue: true
|
||||
|
||||
- tenant: "acme-corp"
|
||||
jobType: "export"
|
||||
maxActive: 4
|
||||
maxPerHour: 100
|
||||
```
|
||||
|
||||
### 4.2 Rate Limit Enforcement
|
||||
|
||||
1. **Quota Check** - Before leasing, verify tenant hasn't exceeded limits
|
||||
2. **Burst Control** - Allow short bursts within configured window
|
||||
3. **Staging** - Jobs exceeding limits staged with `nextEligibleAt` timestamp
|
||||
4. **Priority Bypass** - Emergency jobs can skip queue (with separate limits)
|
||||
|
||||
### 4.3 Dynamic Controls
|
||||
|
||||
| Control | API | Purpose |
|
||||
|---------|-----|---------|
|
||||
| `pauseSource` | `POST /api/limits/pause` | Halt specific job sources |
|
||||
| `resumeSource` | `POST /api/limits/resume` | Resume paused sources |
|
||||
| `throttle` | `POST /api/limits/throttle` | Apply temporary throttle |
|
||||
| `updateQuota` | `PATCH /api/quotas/{id}` | Modify quota limits |
|
||||
|
||||
### 4.4 Circuit Breakers
|
||||
|
||||
- Auto-pause job types when failure rate > threshold (default 50%)
|
||||
- Incident events generated via Notify
|
||||
- Half-open testing after cooldown period
|
||||
- Manual reset via operator action
|
||||
|
||||
---
|
||||
|
||||
## 5. TaskRunner Bridge
|
||||
|
||||
### 5.1 Pack-Run Integration
|
||||
|
||||
The Orchestrator provides specialized support for TaskRunner pack executions:
|
||||
|
||||
```json
|
||||
{
|
||||
"jobType": "pack-run",
|
||||
"payload": {
|
||||
"packId": "vuln-scan-and-report",
|
||||
"packVersion": "1.2.0",
|
||||
"planHash": "sha256:...",
|
||||
"inputs": { "imageRef": "nginx:latest" },
|
||||
"artifacts": [],
|
||||
"logChannel": "sse:/runs/{runId}/logs",
|
||||
"heartbeatCadence": 30
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 5.2 Heartbeat Protocol
|
||||
|
||||
- Workers send heartbeats every `heartbeatCadence` seconds
|
||||
- Missed heartbeats trigger lease expiration
|
||||
- Lease can be extended for long-running tasks
|
||||
- Dead workers detected within 2x heartbeat interval
|
||||
|
||||
### 5.3 Artifact & Log Streaming
|
||||
|
||||
| Endpoint | Method | Purpose |
|
||||
|----------|--------|---------|
|
||||
| `/runs/{runId}/logs` | SSE | Stream execution logs |
|
||||
| `/runs/{runId}/artifacts` | GET | List produced artifacts |
|
||||
| `/runs/{runId}/artifacts/{name}` | GET | Download artifact |
|
||||
| `/runs/{runId}/heartbeat` | POST | Extend lease |
|
||||
|
||||
---
|
||||
|
||||
## 6. Event Model
|
||||
|
||||
### 6.1 Event Envelope
|
||||
|
||||
```json
|
||||
{
|
||||
"eventId": "uuid",
|
||||
"eventType": "job.queued|job.leased|job.completed|job.failed",
|
||||
"timestamp": "2025-11-29T12:00:00Z",
|
||||
"tenant": "tenant-id",
|
||||
"jobId": "job-id",
|
||||
"jobType": "scan",
|
||||
"correlationId": "trace-id",
|
||||
"idempotencyKey": "unique-key",
|
||||
"payload": {
|
||||
"status": "completed",
|
||||
"duration": 45.2,
|
||||
"result": { "verdict": "pass" }
|
||||
},
|
||||
"provenance": {
|
||||
"workerId": "worker-1",
|
||||
"leaseId": "lease-id",
|
||||
"taskRunnerId": "runner-1"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Event Types
|
||||
|
||||
| Event | Trigger | Consumers |
|
||||
|-------|---------|-----------|
|
||||
| `job.queued` | Job enqueued | Dashboard, Notify |
|
||||
| `job.leased` | Worker acquired job | Dashboard |
|
||||
| `job.started` | Execution began | Dashboard, Notify |
|
||||
| `job.progress` | Progress update | Dashboard (SSE) |
|
||||
| `job.completed` | Success | Dashboard, Notify, Export |
|
||||
| `job.failed` | Error occurred | Dashboard, Notify, Incident |
|
||||
| `job.canceled` | Operator abort | Dashboard, Notify |
|
||||
| `job.replayed` | Replay initiated | Dashboard, Audit |
|
||||
|
||||
### 6.3 Fan-Out Channels
|
||||
|
||||
- **SSE** - Real-time dashboard feeds
|
||||
- **GraphQL Subscriptions** - Console UI
|
||||
- **Notify** - Alert routing based on rules
|
||||
- **Webhooks** - External integrations
|
||||
- **Audit Log** - Compliance storage
|
||||
|
||||
---
|
||||
|
||||
## 7. Replay Semantics
|
||||
|
||||
### 7.1 Deterministic Replay
|
||||
|
||||
Jobs can be replayed for audit, debugging, or recovery:
|
||||
|
||||
```bash
|
||||
# Replay a completed job
|
||||
stella job replay --id job-12345
|
||||
|
||||
# Replay with sealed mode (offline verification)
|
||||
stella job replay --id job-12345 --sealed --bundle output.tar.gz
|
||||
```
|
||||
|
||||
### 7.2 Replay Guarantees
|
||||
|
||||
| Property | Guarantee |
|
||||
|----------|-----------|
|
||||
| **Input preservation** | Same payloadDigest, cursors |
|
||||
| **Ordering** | Same processing order |
|
||||
| **Determinism** | Same outputs for same inputs |
|
||||
| **Provenance** | `replayOf` pointer to original |
|
||||
|
||||
### 7.3 Replay Record
|
||||
|
||||
```json
|
||||
{
|
||||
"jobId": "replay-job-id",
|
||||
"replayOf": "original-job-id",
|
||||
"priority": "high",
|
||||
"reason": "audit-verification",
|
||||
"requestedBy": "auditor@example.com",
|
||||
"cursors": {
|
||||
"advisory": "cursor-abc",
|
||||
"vex": "cursor-def"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Implementation Strategy
|
||||
|
||||
### 8.1 Phase 1: Core Lifecycle (Complete)
|
||||
|
||||
- [x] Job state machine
|
||||
- [x] MongoDB queue with leasing
|
||||
- [x] Basic quota enforcement
|
||||
- [x] Dashboard SSE feeds
|
||||
|
||||
### 8.2 Phase 2: Pack-Run Bridge (In Progress)
|
||||
|
||||
- [x] Pack-run job type registration
|
||||
- [x] Log/artifact streaming
|
||||
- [ ] Heartbeat protocol (ORCH-PACK-37-001)
|
||||
- [ ] Event envelope finalization (ORCH-SVC-37-101)
|
||||
|
||||
### 8.3 Phase 3: Advanced Controls (Planned)
|
||||
|
||||
- [ ] Circuit breaker automation
|
||||
- [ ] Quota analytics dashboard
|
||||
- [ ] Replay verification tooling
|
||||
- [ ] Incident mode integration
|
||||
|
||||
---
|
||||
|
||||
## 9. API Surface
|
||||
|
||||
### 9.1 Job Management
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/jobs` | GET | `orch:read` | List jobs with filters |
|
||||
| `/api/jobs/{id}` | GET | `orch:read` | Job detail |
|
||||
| `/api/jobs/{id}/cancel` | POST | `orch:operate` | Cancel job |
|
||||
| `/api/jobs/{id}/replay` | POST | `orch:operate` | Schedule replay |
|
||||
|
||||
### 9.2 Quota Management
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/quotas` | GET | `orch:read` | List quotas |
|
||||
| `/api/quotas/{id}` | PATCH | `orch:quota` | Update quota |
|
||||
| `/api/limits/throttle` | POST | `orch:quota` | Apply throttle |
|
||||
| `/api/limits/pause` | POST | `orch:quota` | Pause source |
|
||||
| `/api/limits/resume` | POST | `orch:quota` | Resume source |
|
||||
|
||||
### 9.3 Dashboard
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/dashboard/metrics` | GET | `orch:read` | Aggregated metrics |
|
||||
| `/api/dashboard/events` | SSE | `orch:read` | Real-time events |
|
||||
|
||||
---
|
||||
|
||||
## 10. Storage Model
|
||||
|
||||
### 10.1 Collections
|
||||
|
||||
| Collection | Purpose | Key Fields |
|
||||
|------------|---------|------------|
|
||||
| `jobs` | Current job state | `_id`, `tenant`, `jobType`, `status`, `priority` |
|
||||
| `job_history` | Append-only audit | `jobId`, `event`, `timestamp`, `actor` |
|
||||
| `sources` | Job sources registry | `sourceId`, `tenant`, `status` |
|
||||
| `quotas` | Quota definitions | `tenant`, `jobType`, `limits` |
|
||||
| `throttles` | Active throttles | `tenant`, `source`, `until` |
|
||||
| `incidents` | Escalated failures | `jobId`, `reason`, `status` |
|
||||
|
||||
### 10.2 Indexes
|
||||
|
||||
- `{tenant, jobType, status}` on `jobs`
|
||||
- `{tenant, status, startedAt}` on `jobs`
|
||||
- `{jobId, timestamp}` on `job_history`
|
||||
- TTL index on transient lease records
|
||||
|
||||
---
|
||||
|
||||
## 11. Observability
|
||||
|
||||
### 11.1 Metrics
|
||||
|
||||
- `job_queue_depth{jobType,tenant}`
|
||||
- `job_latency_seconds{jobType,phase}`
|
||||
- `job_failures_total{jobType,reason}`
|
||||
- `job_retry_total{jobType}`
|
||||
- `lease_extensions_total{jobType}`
|
||||
- `quota_exceeded_total{tenant}`
|
||||
- `circuit_breaker_state{jobType}`
|
||||
|
||||
### 11.2 Pack-Run Metrics
|
||||
|
||||
- `pack_run_logs_stream_lag_seconds`
|
||||
- `pack_run_heartbeats_total`
|
||||
- `pack_run_artifacts_total`
|
||||
- `pack_run_duration_seconds`
|
||||
|
||||
---
|
||||
|
||||
## 12. Offline Support
|
||||
|
||||
### 12.1 Audit Bundle Export
|
||||
|
||||
```bash
|
||||
stella orch export --tenant acme-corp --since 2025-11-01 --output audit-bundle.tar.gz
|
||||
```
|
||||
|
||||
Bundle contents:
|
||||
- `jobs.jsonl` - Job records
|
||||
- `history.jsonl` - State transitions
|
||||
- `throttles.jsonl` - Throttle events
|
||||
- `manifest.json` - Bundle metadata
|
||||
- `signatures/` - DSSE signatures
|
||||
|
||||
### 12.2 Replay Verification
|
||||
|
||||
```bash
|
||||
# Verify job determinism
|
||||
stella job verify --bundle audit-bundle.tar.gz --job-id job-12345
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 13. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Orchestrator architecture | `docs/modules/orchestrator/architecture.md` |
|
||||
| Event envelope spec | `docs/modules/orchestrator/event-envelope.md` |
|
||||
| TaskRunner integration | `docs/modules/taskrunner/orchestrator-bridge.md` |
|
||||
|
||||
---
|
||||
|
||||
## 14. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0151_0001_0001_orchestrator_i.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0152_0001_0002_orchestrator_ii.md
|
||||
- SPRINT_0153_0001_0003_orchestrator_iii.md
|
||||
- SPRINT_0157_0001_0001_taskrunner_i.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `ORCH-CORE-30-001` - Job lifecycle (DONE)
|
||||
- `ORCH-QUOTA-31-001` - Quota governance (DONE)
|
||||
- `ORCH-PACK-37-001` - Pack-run bridge (IN PROGRESS)
|
||||
- `ORCH-SVC-37-101` - Event envelope (IN PROGRESS)
|
||||
- `ORCH-REPLAY-38-001` - Replay verification (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 15. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Job scheduling latency | < 100ms p99 |
|
||||
| Lease acquisition time | < 50ms p99 |
|
||||
| Event fan-out delay | < 500ms |
|
||||
| Quota enforcement accuracy | 100% |
|
||||
| Replay determinism | 100% match |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,394 @@
|
||||
# Policy Simulation and Shadow Gates
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, simulation semantics, and implementation strategy for Policy Engine simulation features, covering shadow runs, coverage fixtures, and promotion gates.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Policy simulation enables **safe testing of policy changes** before production deployment. Key capabilities:
|
||||
|
||||
- **Shadow Runs** - Execute policies without enforcement
|
||||
- **Diff Summaries** - Compare old vs new policy outcomes
|
||||
- **Coverage Fixtures** - Validate expected findings
|
||||
- **Promotion Gates** - Block promotion until tests pass
|
||||
- **Deterministic Replay** - Reproduce simulation results
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Simulation Requirements | Use Case |
|
||||
|---------|------------------------|----------|
|
||||
| **Policy Authors** | Preview changes | Development workflow |
|
||||
| **Security Leads** | Approve promotions | Change management |
|
||||
| **Compliance** | Audit trail | Policy change evidence |
|
||||
| **DevSecOps** | CI integration | Automated testing |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools lack policy simulation. Stella Ops differentiates with:
|
||||
- **Shadow execution** without production impact
|
||||
- **Diff visualization** of policy changes
|
||||
- **Coverage testing** with fixture validation
|
||||
- **Promotion gates** for governance
|
||||
- **Deterministic replay** for audit
|
||||
|
||||
---
|
||||
|
||||
## 3. Simulation Modes
|
||||
|
||||
### 3.1 Shadow Run
|
||||
|
||||
Execute policy against real data without enforcement:
|
||||
|
||||
```bash
|
||||
stella policy simulate \
|
||||
--policy my-policy:v2 \
|
||||
--scope "tenant:acme-corp,namespace:production" \
|
||||
--shadow
|
||||
```
|
||||
|
||||
**Behavior:**
|
||||
- Evaluates all findings
|
||||
- Records verdicts to shadow collections
|
||||
- No enforcement actions
|
||||
- No notifications triggered
|
||||
- Metrics tagged with `shadow=true`
|
||||
|
||||
### 3.2 Diff Run
|
||||
|
||||
Compare two policy versions:
|
||||
|
||||
```bash
|
||||
stella policy diff \
|
||||
--old my-policy:v1 \
|
||||
--new my-policy:v2 \
|
||||
--scope "tenant:acme-corp"
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```json
|
||||
{
|
||||
"summary": {
|
||||
"added": 12,
|
||||
"removed": 5,
|
||||
"changed": 8,
|
||||
"unchanged": 234
|
||||
},
|
||||
"changes": [
|
||||
{
|
||||
"findingId": "finding-123",
|
||||
"cve": "CVE-2025-12345",
|
||||
"oldVerdict": "warned",
|
||||
"newVerdict": "blocked",
|
||||
"reason": "rule 'critical-cves' now matches"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### 3.3 Coverage Run
|
||||
|
||||
Validate policy against fixture expectations:
|
||||
|
||||
```bash
|
||||
stella policy coverage \
|
||||
--policy my-policy:v2 \
|
||||
--fixtures fixtures/policy-tests.yaml
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 4. Coverage Fixtures
|
||||
|
||||
### 4.1 Fixture Format
|
||||
|
||||
```yaml
|
||||
apiVersion: stellaops.io/policy-test.v1
|
||||
kind: PolicyFixture
|
||||
metadata:
|
||||
name: critical-cve-blocking
|
||||
policy: my-policy
|
||||
|
||||
fixtures:
|
||||
- name: "Block critical CVE in production"
|
||||
input:
|
||||
finding:
|
||||
cve: "CVE-2025-12345"
|
||||
severity: critical
|
||||
ecosystem: npm
|
||||
component: "lodash@4.17.20"
|
||||
context:
|
||||
namespace: production
|
||||
labels:
|
||||
tier: frontend
|
||||
expected:
|
||||
verdict: blocked
|
||||
rulesMatched: ["critical-cves", "production-strict"]
|
||||
|
||||
- name: "Warn on high CVE in staging"
|
||||
input:
|
||||
finding:
|
||||
cve: "CVE-2025-12346"
|
||||
severity: high
|
||||
ecosystem: npm
|
||||
expected:
|
||||
verdict: warned
|
||||
|
||||
- name: "Ignore low CVE with VEX"
|
||||
input:
|
||||
finding:
|
||||
cve: "CVE-2025-12347"
|
||||
severity: low
|
||||
vexStatus: not_affected
|
||||
vexJustification: "component_not_present"
|
||||
expected:
|
||||
verdict: ignored
|
||||
```
|
||||
|
||||
### 4.2 Fixture Results
|
||||
|
||||
```json
|
||||
{
|
||||
"total": 25,
|
||||
"passed": 23,
|
||||
"failed": 2,
|
||||
"failures": [
|
||||
{
|
||||
"fixture": "Block critical CVE in production",
|
||||
"expected": {"verdict": "blocked"},
|
||||
"actual": {"verdict": "warned"},
|
||||
"diff": "rule 'critical-cves' did not match due to missing label"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Promotion Gates
|
||||
|
||||
### 5.1 Gate Requirements
|
||||
|
||||
Before a policy can be promoted from draft to active:
|
||||
|
||||
| Gate | Requirement | Enforcement |
|
||||
|------|-------------|-------------|
|
||||
| Shadow Run | Complete without errors | Required |
|
||||
| Coverage | 100% fixtures pass | Required |
|
||||
| Diff Review | Changes reviewed | Optional |
|
||||
| Approval | Human sign-off | Configurable |
|
||||
|
||||
### 5.2 Promotion Workflow
|
||||
|
||||
```mermaid
|
||||
stateDiagram-v2
|
||||
[*] --> Draft
|
||||
Draft --> Shadow: Start shadow run
|
||||
Shadow --> Coverage: Run coverage tests
|
||||
Coverage --> Review: Pass fixtures
|
||||
Review --> Approval: Review diff
|
||||
Approval --> Active: Approve
|
||||
Coverage --> Draft: Fix failures
|
||||
Approval --> Draft: Reject
|
||||
```
|
||||
|
||||
### 5.3 CLI Commands
|
||||
|
||||
```bash
|
||||
# Start shadow run
|
||||
stella policy promote start --policy my-policy:v2
|
||||
|
||||
# Check promotion status
|
||||
stella policy promote status --policy my-policy:v2
|
||||
|
||||
# Complete promotion (requires approval)
|
||||
stella policy promote complete --policy my-policy:v2 --comment "Reviewed and approved"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Determinism Requirements
|
||||
|
||||
### 6.1 Simulation Guarantees
|
||||
|
||||
| Property | Guarantee |
|
||||
|----------|-----------|
|
||||
| Input ordering | Stable sort by (tenant, policyId, findingKey) |
|
||||
| Rule evaluation | First-match semantics |
|
||||
| Timestamp handling | Injected TimeProvider |
|
||||
| Random values | Injected IRandom |
|
||||
|
||||
### 6.2 Replay Hash
|
||||
|
||||
Each simulation computes:
|
||||
```
|
||||
determinismHash = SHA256(policyVersion + inputsHash + rulesHash)
|
||||
```
|
||||
|
||||
Replays with same hash must produce identical results.
|
||||
|
||||
---
|
||||
|
||||
## 7. Implementation Strategy
|
||||
|
||||
### 7.1 Phase 1: Shadow Runs (Complete)
|
||||
|
||||
- [x] Shadow collection isolation
|
||||
- [x] Shadow metrics tagging
|
||||
- [x] Shadow run API
|
||||
- [x] CLI integration
|
||||
|
||||
### 7.2 Phase 2: Diff & Coverage (In Progress)
|
||||
|
||||
- [x] Policy diff algorithm
|
||||
- [x] Diff visualization
|
||||
- [ ] Coverage fixture parser (POLICY-COV-50-001)
|
||||
- [ ] Coverage runner (POLICY-COV-50-002)
|
||||
|
||||
### 7.3 Phase 3: Promotion Gates (Planned)
|
||||
|
||||
- [ ] Gate configuration schema
|
||||
- [ ] Promotion state machine
|
||||
- [ ] Approval workflow integration
|
||||
- [ ] Console UI for review
|
||||
|
||||
---
|
||||
|
||||
## 8. API Surface
|
||||
|
||||
### 8.1 Simulation APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/simulate` | POST | `policy:simulate` | Start simulation |
|
||||
| `/api/policy/simulate/{id}` | GET | `policy:read` | Get simulation status |
|
||||
| `/api/policy/simulate/{id}/results` | GET | `policy:read` | Get results |
|
||||
|
||||
### 8.2 Diff APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/diff` | POST | `policy:read` | Compare versions |
|
||||
|
||||
### 8.3 Coverage APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/coverage` | POST | `policy:simulate` | Run coverage |
|
||||
| `/api/policy/coverage/{id}` | GET | `policy:read` | Get results |
|
||||
|
||||
### 8.4 Promotion APIs
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/api/policy/promote` | POST | `policy:promote` | Start promotion |
|
||||
| `/api/policy/promote/{id}` | GET | `policy:read` | Get status |
|
||||
| `/api/policy/promote/{id}/approve` | POST | `policy:approve` | Approve promotion |
|
||||
| `/api/policy/promote/{id}/reject` | POST | `policy:approve` | Reject promotion |
|
||||
|
||||
---
|
||||
|
||||
## 9. Storage Model
|
||||
|
||||
### 9.1 Collections
|
||||
|
||||
| Collection | Purpose |
|
||||
|------------|---------|
|
||||
| `policy_simulations` | Simulation records |
|
||||
| `policy_simulation_results` | Per-finding results |
|
||||
| `policy_coverage_runs` | Coverage executions |
|
||||
| `policy_promotions` | Promotion state |
|
||||
|
||||
### 9.2 Shadow Isolation
|
||||
|
||||
Shadow results stored in separate collections:
|
||||
- `effective_finding_{policyId}_shadow`
|
||||
- Never mixed with production data
|
||||
- TTL-based cleanup (default 7 days)
|
||||
|
||||
---
|
||||
|
||||
## 10. Observability
|
||||
|
||||
### 10.1 Metrics
|
||||
|
||||
- `policy_simulation_duration_seconds{mode}`
|
||||
- `policy_coverage_pass_rate{policy}`
|
||||
- `policy_promotion_gate_status{gate,status}`
|
||||
- `policy_diff_changes_total{changeType}`
|
||||
|
||||
### 10.2 Audit Events
|
||||
|
||||
- `policy.simulation.started`
|
||||
- `policy.simulation.completed`
|
||||
- `policy.coverage.passed`
|
||||
- `policy.coverage.failed`
|
||||
- `policy.promotion.approved`
|
||||
- `policy.promotion.rejected`
|
||||
|
||||
---
|
||||
|
||||
## 11. Console Integration
|
||||
|
||||
### 11.1 Policy Editor
|
||||
|
||||
- Inline simulation button
|
||||
- Real-time diff preview
|
||||
- Coverage status badge
|
||||
|
||||
### 11.2 Promotion Dashboard
|
||||
|
||||
- Pending promotions list
|
||||
- Gate status visualization
|
||||
- Approval/reject actions
|
||||
|
||||
---
|
||||
|
||||
## 12. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Policy architecture | `docs/modules/policy/architecture.md` |
|
||||
| DSL reference | `docs/policy/dsl.md` |
|
||||
| Lifecycle guide | `docs/policy/lifecycle.md` |
|
||||
| Runtime guide | `docs/policy/runtime.md` |
|
||||
|
||||
---
|
||||
|
||||
## 13. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0185_0001_0001_policy_simulation.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0120_0000_0001_policy_reasoning.md
|
||||
- SPRINT_0121_0001_0001_policy_reasoning.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `POLICY-SIM-40-001` - Shadow runs (DONE)
|
||||
- `POLICY-DIFF-41-001` - Diff algorithm (DONE)
|
||||
- `POLICY-COV-50-001` - Coverage fixtures (IN PROGRESS)
|
||||
- `POLICY-COV-50-002` - Coverage runner (IN PROGRESS)
|
||||
- `POLICY-PROM-55-001` - Promotion gates (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 14. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Simulation latency | < 2 min (10k findings) |
|
||||
| Coverage accuracy | 100% fixture matching |
|
||||
| Promotion gate enforcement | 100% adherence |
|
||||
| Shadow isolation | Zero production leakage |
|
||||
| Replay determinism | 100% hash match |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,444 @@
|
||||
# Runtime Posture and Observation with Zastava
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, observation model, and implementation strategy for the Zastava module, covering runtime inspection, admission control, drift detection, and posture verification.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
Zastava is the **runtime inspector and enforcer** that provides ground-truth from running environments. Key capabilities:
|
||||
|
||||
- **Runtime Observation** - Inventory containers, track entrypoints, monitor loaded DSOs
|
||||
- **Admission Control** - Kubernetes ValidatingAdmissionWebhook for pre-flight gates
|
||||
- **Drift Detection** - Identify unexpected processes, libraries, and file changes
|
||||
- **Posture Verification** - Validate signatures, SBOM referrers, attestations
|
||||
- **Build-ID Tracking** - Correlate binaries to debug symbols and source
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Runtime Requirements | Use Case |
|
||||
|---------|---------------------|----------|
|
||||
| **Enterprise Security** | Runtime visibility | Post-deploy monitoring |
|
||||
| **Platform Engineering** | Admission gates | Policy enforcement |
|
||||
| **Compliance Teams** | Continuous verification | Runtime attestation |
|
||||
| **DevSecOps** | Drift detection | Configuration management |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability scanners focus on build-time analysis. Stella Ops differentiates with:
|
||||
- **Runtime ground-truth** from actual container execution
|
||||
- **DSO tracking** - which libraries are actually loaded
|
||||
- **Entrypoint tracing** - what programs actually run
|
||||
- **Native Kubernetes admission** with policy integration
|
||||
- **Build-ID correlation** for symbol resolution
|
||||
|
||||
---
|
||||
|
||||
## 3. Architecture Overview
|
||||
|
||||
### 3.1 Component Topology
|
||||
|
||||
**Kubernetes Deployment:**
|
||||
```
|
||||
stellaops/zastava-observer # DaemonSet on every node (read-only host mounts)
|
||||
stellaops/zastava-webhook # ValidatingAdmissionWebhook (Deployment, 2+ replicas)
|
||||
```
|
||||
|
||||
**Docker/VM Deployment:**
|
||||
```
|
||||
stellaops/zastava-agent # System service; watch Docker events; observer only
|
||||
```
|
||||
|
||||
### 3.2 Dependencies
|
||||
|
||||
| Dependency | Purpose |
|
||||
|------------|---------|
|
||||
| Authority | OpToks (DPoP/mTLS) for API calls |
|
||||
| Scanner.WebService | Event ingestion, policy decisions |
|
||||
| OCI Registry | Referrer/signature checks |
|
||||
| Container Runtime | containerd/CRI-O/Docker interfaces |
|
||||
| Kubernetes API | Pod watching, admission webhook |
|
||||
|
||||
---
|
||||
|
||||
## 4. Runtime Event Model
|
||||
|
||||
### 4.1 Event Types
|
||||
|
||||
| Kind | Trigger | Payload |
|
||||
|------|---------|---------|
|
||||
| `CONTAINER_START` | Container lifecycle | Image, entrypoint, namespace |
|
||||
| `CONTAINER_STOP` | Container termination | Exit code, duration |
|
||||
| `DRIFT` | Unexpected change | Changed files, new binaries |
|
||||
| `POLICY_VIOLATION` | Rule breach | Reason, severity |
|
||||
| `ATTESTATION_STATUS` | Verification result | Signed, SBOM present |
|
||||
|
||||
### 4.2 Event Envelope
|
||||
|
||||
```json
|
||||
{
|
||||
"eventId": "uuid",
|
||||
"when": "2025-11-29T12:00:00Z",
|
||||
"kind": "CONTAINER_START",
|
||||
"tenant": "acme-corp",
|
||||
"node": "worker-node-01",
|
||||
"runtime": {
|
||||
"engine": "containerd",
|
||||
"version": "1.7.19"
|
||||
},
|
||||
"workload": {
|
||||
"platform": "kubernetes",
|
||||
"namespace": "production",
|
||||
"pod": "api-7c9fbbd8b7-ktd84",
|
||||
"container": "api",
|
||||
"containerId": "containerd://abc123...",
|
||||
"imageRef": "ghcr.io/acme/api@sha256:def456...",
|
||||
"owner": {
|
||||
"kind": "Deployment",
|
||||
"name": "api"
|
||||
}
|
||||
},
|
||||
"process": {
|
||||
"pid": 12345,
|
||||
"entrypoint": ["/entrypoint.sh", "--serve"],
|
||||
"entryTrace": [
|
||||
{"file": "/entrypoint.sh", "line": 3, "op": "exec", "target": "/usr/bin/python3"},
|
||||
{"file": "<argv>", "op": "python", "target": "/opt/app/server.py"}
|
||||
],
|
||||
"buildId": "9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1"
|
||||
},
|
||||
"loadedLibs": [
|
||||
{"path": "/lib/x86_64-linux-gnu/libssl.so.3", "inode": 123456, "sha256": "..."},
|
||||
{"path": "/usr/lib/x86_64-linux-gnu/libcrypto.so.3", "inode": 123457, "sha256": "..."}
|
||||
],
|
||||
"posture": {
|
||||
"imageSigned": true,
|
||||
"sbomReferrer": "present",
|
||||
"attestation": {
|
||||
"uuid": "rekor-uuid",
|
||||
"verified": true
|
||||
}
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Observer Capabilities
|
||||
|
||||
### 5.1 Container Lifecycle Tracking
|
||||
|
||||
- Watch container start/stop via CRI socket
|
||||
- Resolve container to image digest
|
||||
- Map mount points and rootfs paths
|
||||
- Track container metadata (labels, annotations)
|
||||
|
||||
### 5.2 Entrypoint Tracing
|
||||
|
||||
- Attach short-lived nsenter to container PID 1
|
||||
- Parse shell scripts for exec chain
|
||||
- Record terminal program (actual binary)
|
||||
- Bounded depth to prevent infinite loops
|
||||
|
||||
### 5.3 Loaded Library Sampling
|
||||
|
||||
- Read `/proc/<pid>/maps` for loaded DSOs
|
||||
- Compute SHA-256 for each mapped file
|
||||
- Track GNU build-IDs for symbol correlation
|
||||
- Rate limits prevent resource exhaustion
|
||||
|
||||
### 5.4 Posture Verification
|
||||
|
||||
- Image signature presence (cosign policies)
|
||||
- SBOM referrers check (registry HEAD)
|
||||
- Rekor attestation lookup via Scanner.WebService
|
||||
- Policy verdict from backend
|
||||
|
||||
---
|
||||
|
||||
## 6. Admission Control
|
||||
|
||||
### 6.1 Gate Criteria
|
||||
|
||||
| Criterion | Description | Configurable |
|
||||
|-----------|-------------|--------------|
|
||||
| Image Signature | Cosign-verifiable to configured keys | Yes |
|
||||
| SBOM Availability | CycloneDX referrer or catalog entry | Yes |
|
||||
| Policy Verdict | Backend PASS required | Yes |
|
||||
| Registry Allowlist | Permitted registries | Yes |
|
||||
| Tag Bans | Reject `:latest`, etc. | Yes |
|
||||
| Base Image Allowlist | Permitted base digests | Yes |
|
||||
|
||||
### 6.2 Decision Flow
|
||||
|
||||
```mermaid
|
||||
sequenceDiagram
|
||||
participant K8s as API Server
|
||||
participant WH as Zastava Webhook
|
||||
participant SW as Scanner.WebService
|
||||
|
||||
K8s->>WH: AdmissionReview(Pod)
|
||||
WH->>WH: Resolve images to digests
|
||||
WH->>SW: POST /policy/runtime
|
||||
SW-->>WH: {signed, hasSbom, verdict, reasons}
|
||||
alt All pass
|
||||
WH-->>K8s: Allow
|
||||
else Any fail
|
||||
WH-->>K8s: Deny (with reasons)
|
||||
end
|
||||
```
|
||||
|
||||
### 6.3 Response Caching
|
||||
|
||||
- Per-digest results cached for TTL (default 300s)
|
||||
- Fail-open or fail-closed per namespace
|
||||
- Cache invalidation on policy updates
|
||||
|
||||
---
|
||||
|
||||
## 7. Drift Detection
|
||||
|
||||
### 7.1 Signal Types
|
||||
|
||||
| Signal | Detection Method | Action |
|
||||
|--------|-----------------|--------|
|
||||
| Process Drift | Terminal program differs from EntryTrace baseline | Alert |
|
||||
| Library Drift | Loaded DSOs not in Usage SBOM | Alert, delta scan |
|
||||
| Filesystem Drift | New executables with mtime after image creation | Alert |
|
||||
| Network Drift | Unexpected listening ports | Alert (optional) |
|
||||
|
||||
### 7.2 Drift Event
|
||||
|
||||
```json
|
||||
{
|
||||
"kind": "DRIFT",
|
||||
"delta": {
|
||||
"baselineImageDigest": "sha256:abc...",
|
||||
"changedFiles": ["/opt/app/server.py"],
|
||||
"newBinaries": [
|
||||
{"path": "/usr/local/bin/helper", "sha256": "..."}
|
||||
]
|
||||
},
|
||||
"evidence": [
|
||||
{"signal": "procfs.maps", "value": "/lib/.../libssl.so.3@0x7f..."},
|
||||
{"signal": "cri.task.inspect", "value": "pid=12345"}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Build-ID Workflow
|
||||
|
||||
### 8.1 Capture
|
||||
|
||||
1. Observer extracts `NT_GNU_BUILD_ID` from `/proc/<pid>/exe`
|
||||
2. Normalize to lower-case hex
|
||||
3. Include in runtime event as `process.buildId`
|
||||
|
||||
### 8.2 Correlation
|
||||
|
||||
1. Scanner.WebService persists observation
|
||||
2. Policy responses include `buildIds` list
|
||||
3. Debug files matched via `.build-id/<aa>/<rest>.debug`
|
||||
|
||||
### 8.3 Symbol Resolution
|
||||
|
||||
```bash
|
||||
# Via CLI
|
||||
stella runtime policy test --image sha256:abc123... | jq '.buildIds'
|
||||
|
||||
# Via debuginfod
|
||||
debuginfod-find debuginfo 9f3a1cd4c0b7adfe91c0e3b51d2f45fb0f76a4c1
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 9. Implementation Strategy
|
||||
|
||||
### 9.1 Phase 1: Observer Core (Complete)
|
||||
|
||||
- [x] CRI socket integration
|
||||
- [x] Container lifecycle tracking
|
||||
- [x] Entrypoint tracing
|
||||
- [x] Loaded library sampling
|
||||
- [x] Event batching and compression
|
||||
|
||||
### 9.2 Phase 2: Admission Webhook (Complete)
|
||||
|
||||
- [x] ValidatingAdmissionWebhook
|
||||
- [x] Image digest resolution
|
||||
- [x] Policy integration
|
||||
- [x] Response caching
|
||||
- [x] Fail-open/closed modes
|
||||
|
||||
### 9.3 Phase 3: Drift Detection (In Progress)
|
||||
|
||||
- [x] Process drift detection
|
||||
- [x] Library drift detection
|
||||
- [ ] Filesystem drift monitoring (ZASTAVA-DRIFT-50-001)
|
||||
- [ ] Network posture checks (ZASTAVA-NET-51-001)
|
||||
|
||||
### 9.4 Phase 4: Advanced Features (Planned)
|
||||
|
||||
- [ ] eBPF syscall tracing (optional)
|
||||
- [ ] Windows container support
|
||||
- [ ] Live used-by-entrypoint synthesis
|
||||
- [ ] Admission dry-run dashboards
|
||||
|
||||
---
|
||||
|
||||
## 10. Configuration
|
||||
|
||||
```yaml
|
||||
zastava:
|
||||
mode:
|
||||
observer: true
|
||||
webhook: true
|
||||
|
||||
backend:
|
||||
baseAddress: "https://scanner-web.internal"
|
||||
policyPath: "/api/v1/scanner/policy/runtime"
|
||||
requestTimeoutSeconds: 5
|
||||
|
||||
runtime:
|
||||
authority:
|
||||
issuer: "https://authority.internal"
|
||||
clientId: "zastava-observer"
|
||||
audience: ["scanner", "zastava"]
|
||||
scopes: ["api:scanner.runtime.write"]
|
||||
requireDpop: true
|
||||
requireMutualTls: true
|
||||
|
||||
tenant: "acme-corp"
|
||||
engine: "auto" # containerd|cri-o|docker|auto
|
||||
procfs: "/host/proc"
|
||||
|
||||
collect:
|
||||
entryTrace: true
|
||||
loadedLibs: true
|
||||
maxLibs: 256
|
||||
maxHashBytesPerContainer: 64000000
|
||||
|
||||
admission:
|
||||
enforce: true
|
||||
failOpenNamespaces: ["dev", "test"]
|
||||
verify:
|
||||
imageSignature: true
|
||||
sbomReferrer: true
|
||||
scannerPolicyPass: true
|
||||
cacheTtlSeconds: 300
|
||||
|
||||
limits:
|
||||
eventsPerSecond: 50
|
||||
burst: 200
|
||||
perNodeQueue: 10000
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Security Posture
|
||||
|
||||
### 11.1 Privileges
|
||||
|
||||
| Capability | Purpose | Mode |
|
||||
|------------|---------|------|
|
||||
| `CAP_SYS_PTRACE` | nsenter trace | Optional |
|
||||
| `CAP_DAC_READ_SEARCH` | Read /proc | Required |
|
||||
| Host PID namespace | Container PIDs | Required |
|
||||
| Read-only mounts | /proc, sockets | Required |
|
||||
|
||||
### 11.2 Least Privilege
|
||||
|
||||
- No write mounts
|
||||
- No host networking
|
||||
- No privilege escalation
|
||||
- Read-only rootfs
|
||||
|
||||
### 11.3 Data Minimization
|
||||
|
||||
- No env var exfiltration
|
||||
- No command argument logging (unless diagnostic mode)
|
||||
- Rate limits prevent abuse
|
||||
|
||||
---
|
||||
|
||||
## 12. Observability
|
||||
|
||||
### 12.1 Observer Metrics
|
||||
|
||||
- `zastava.runtime.events.total{kind}`
|
||||
- `zastava.runtime.backend.latency.ms{endpoint}`
|
||||
- `zastava.proc_maps.samples.total{result}`
|
||||
- `zastava.entrytrace.depth{p99}`
|
||||
- `zastava.hash.bytes.total`
|
||||
- `zastava.buffer.drops.total`
|
||||
|
||||
### 12.2 Webhook Metrics
|
||||
|
||||
- `zastava.admission.decisions.total{decision}`
|
||||
- `zastava.admission.cache.hits.total`
|
||||
- `zastava.backend.failures.total`
|
||||
|
||||
---
|
||||
|
||||
## 13. Performance Targets
|
||||
|
||||
| Operation | Target |
|
||||
|-----------|--------|
|
||||
| `/proc/<pid>/maps` sampling | < 30ms (64 files) |
|
||||
| Full library hash set | < 200ms (256 libs) |
|
||||
| Admission with warm cache | < 8ms p95 |
|
||||
| Admission with backend call | < 50ms p95 |
|
||||
| Event throughput | 5k events/min/node |
|
||||
|
||||
---
|
||||
|
||||
## 14. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Zastava architecture | `docs/modules/zastava/architecture.md` |
|
||||
| Runtime event schema | `docs/modules/zastava/event-schema.md` |
|
||||
| Admission configuration | `docs/modules/zastava/admission-config.md` |
|
||||
| Deployment guide | `docs/modules/zastava/deployment.md` |
|
||||
|
||||
---
|
||||
|
||||
## 15. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0144_0001_0001_zastava_runtime_signals.md
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0140_0001_0001_runtime_signals.md
|
||||
- SPRINT_0143_0000_0001_signals.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `ZASTAVA-OBS-40-001` - Observer core (DONE)
|
||||
- `ZASTAVA-ADM-41-001` - Admission webhook (DONE)
|
||||
- `ZASTAVA-DRIFT-50-001` - Filesystem drift (IN PROGRESS)
|
||||
- `ZASTAVA-NET-51-001` - Network posture (TODO)
|
||||
- `ZASTAVA-EBPF-60-001` - eBPF integration (FUTURE)
|
||||
|
||||
---
|
||||
|
||||
## 16. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Event capture rate | 99.9% of container starts |
|
||||
| Admission latency | < 50ms p95 |
|
||||
| Drift detection rate | 100% of runtime changes |
|
||||
| False positive rate | < 1% of drift alerts |
|
||||
| Node resource usage | < 2% CPU, < 100MB RAM |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -0,0 +1,373 @@
|
||||
# Telemetry and Observability Patterns
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** 2025-11-29
|
||||
**Status:** Canonical
|
||||
|
||||
This advisory defines the product rationale, collector topology, and implementation strategy for the Telemetry module, covering metrics, traces, logs, forensic pipelines, and offline packaging.
|
||||
|
||||
---
|
||||
|
||||
## 1. Executive Summary
|
||||
|
||||
The Telemetry module provides **unified observability infrastructure** across all Stella Ops components. Key capabilities:
|
||||
|
||||
- **OpenTelemetry Native** - OTLP collection for metrics, traces, logs
|
||||
- **Forensic Mode** - Extended retention and 100% sampling during incidents
|
||||
- **Profile-Based Configuration** - Default, forensic, and air-gap profiles
|
||||
- **Sealed-Mode Guards** - Automatic exporter restrictions in air-gap
|
||||
- **Offline Bundles** - Signed OTLP archives for compliance
|
||||
|
||||
---
|
||||
|
||||
## 2. Market Drivers
|
||||
|
||||
### 2.1 Target Segments
|
||||
|
||||
| Segment | Observability Requirements | Use Case |
|
||||
|---------|---------------------------|----------|
|
||||
| **Platform Ops** | Real-time monitoring | Operational health |
|
||||
| **Security Teams** | Forensic investigation | Incident response |
|
||||
| **Compliance** | Audit trails | SOC 2, FedRAMP |
|
||||
| **DevSecOps** | Pipeline visibility | CI/CD debugging |
|
||||
|
||||
### 2.2 Competitive Positioning
|
||||
|
||||
Most vulnerability tools provide minimal observability. Stella Ops differentiates with:
|
||||
- **Built-in OpenTelemetry** across all services
|
||||
- **Forensic mode** with automatic retention extension
|
||||
- **Sealed-mode compatibility** for air-gap
|
||||
- **Signed OTLP bundles** for compliance archives
|
||||
- **Incident-triggered sampling** escalation
|
||||
|
||||
---
|
||||
|
||||
## 3. Collector Topology
|
||||
|
||||
### 3.1 Architecture
|
||||
|
||||
```
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ Services │
|
||||
│ Scanner │ Policy │ Authority │ Orchestrator │ ... │
|
||||
└─────────────────────┬───────────────────────────────┘
|
||||
│ OTLP/gRPC
|
||||
▼
|
||||
┌─────────────────────────────────────────────────────┐
|
||||
│ OpenTelemetry Collector │
|
||||
│ ┌─────────┐ ┌──────────┐ ┌─────────────────────┐ │
|
||||
│ │ Traces │ │ Metrics │ │ Logs │ │
|
||||
│ └────┬────┘ └────┬─────┘ └──────────┬──────────┘ │
|
||||
│ │ Tail │ Batch │ Redaction │
|
||||
│ │ Sampling │ │ │
|
||||
└───────┼────────────┼─────────────────┼─────────────┘
|
||||
│ │ │
|
||||
▼ ▼ ▼
|
||||
┌────────┐ ┌──────────┐ ┌────────┐
|
||||
│ Tempo │ │Prometheus│ │ Loki │
|
||||
└────────┘ └──────────┘ └────────┘
|
||||
```
|
||||
|
||||
### 3.2 Collector Profiles
|
||||
|
||||
| Profile | Use Case | Configuration |
|
||||
|---------|----------|---------------|
|
||||
| **default** | Normal operation | 10% trace sampling, 30-day retention |
|
||||
| **forensic** | Investigation mode | 100% sampling, 180-day retention |
|
||||
| **airgap** | Offline deployment | File exporters, no external network |
|
||||
|
||||
---
|
||||
|
||||
## 4. Metrics
|
||||
|
||||
### 4.1 Standard Metrics
|
||||
|
||||
| Metric | Type | Labels | Description |
|
||||
|--------|------|--------|-------------|
|
||||
| `stellaops_request_duration_seconds` | Histogram | service, endpoint | Request latency |
|
||||
| `stellaops_request_total` | Counter | service, status | Request count |
|
||||
| `stellaops_active_jobs` | Gauge | tenant, jobType | Active job count |
|
||||
| `stellaops_queue_depth` | Gauge | queue | Queue depth |
|
||||
| `stellaops_scan_duration_seconds` | Histogram | tenant | Scan duration |
|
||||
|
||||
### 4.2 Module-Specific Metrics
|
||||
|
||||
**Policy Engine:**
|
||||
- `policy_run_seconds{mode,tenant,policy}`
|
||||
- `policy_rules_fired_total{policy,rule}`
|
||||
- `policy_vex_overrides_total{policy,vendor}`
|
||||
|
||||
**Scanner:**
|
||||
- `scanner_sbom_components_total{ecosystem}`
|
||||
- `scanner_vulnerabilities_found_total{severity}`
|
||||
- `scanner_attestations_logged_total`
|
||||
|
||||
**Authority:**
|
||||
- `authority_token_issued_total{grant_type,audience}`
|
||||
- `authority_token_rejected_total{reason}`
|
||||
- `authority_dpop_nonce_miss_total`
|
||||
|
||||
---
|
||||
|
||||
## 5. Traces
|
||||
|
||||
### 5.1 Trace Context
|
||||
|
||||
All services propagate W3C Trace Context:
|
||||
- `traceparent` header
|
||||
- `tracestate` for vendor-specific data
|
||||
- `baggage` for cross-service attributes
|
||||
|
||||
### 5.2 Span Conventions
|
||||
|
||||
| Span | Attributes | Description |
|
||||
|------|------------|-------------|
|
||||
| `http.request` | url, method, status | HTTP handler |
|
||||
| `db.query` | collection, operation | MongoDB ops |
|
||||
| `policy.evaluate` | policyId, version | Policy run |
|
||||
| `scan.image` | imageRef, digest | Image scan |
|
||||
| `sign.dsse` | predicateType | DSSE signing |
|
||||
|
||||
### 5.3 Sampling Strategy
|
||||
|
||||
**Default (Tail Sampling):**
|
||||
- Error traces: 100%
|
||||
- Slow traces (>2s): 100%
|
||||
- Normal traces: 10%
|
||||
|
||||
**Forensic Mode:**
|
||||
- All traces: 100%
|
||||
- Extended attributes enabled
|
||||
|
||||
---
|
||||
|
||||
## 6. Logs
|
||||
|
||||
### 6.1 Structured Format
|
||||
|
||||
```json
|
||||
{
|
||||
"timestamp": "2025-11-29T12:00:00.123Z",
|
||||
"level": "info",
|
||||
"message": "Scan completed",
|
||||
"service": "scanner",
|
||||
"traceId": "abc123...",
|
||||
"spanId": "def456...",
|
||||
"tenant": "acme-corp",
|
||||
"imageDigest": "sha256:...",
|
||||
"componentCount": 245,
|
||||
"vulnerabilityCount": 12
|
||||
}
|
||||
```
|
||||
|
||||
### 6.2 Redaction
|
||||
|
||||
Attribute processors strip sensitive data:
|
||||
- `authorization` headers
|
||||
- `secretRef` values
|
||||
- PII based on allowed-key policy
|
||||
|
||||
### 6.3 Log Levels
|
||||
|
||||
| Level | Purpose | Retention |
|
||||
|-------|---------|-----------|
|
||||
| `error` | Failures | 180 days |
|
||||
| `warn` | Anomalies | 90 days |
|
||||
| `info` | Operations | 30 days |
|
||||
| `debug` | Development | 7 days |
|
||||
|
||||
---
|
||||
|
||||
## 7. Forensic Mode
|
||||
|
||||
### 7.1 Activation
|
||||
|
||||
```bash
|
||||
# Activate forensic mode for tenant
|
||||
stella telemetry incident start --tenant acme-corp --reason "CVE-2025-12345 investigation"
|
||||
|
||||
# Check status
|
||||
stella telemetry incident status
|
||||
|
||||
# Deactivate
|
||||
stella telemetry incident stop --tenant acme-corp
|
||||
```
|
||||
|
||||
### 7.2 Behavior Changes
|
||||
|
||||
| Aspect | Default | Forensic |
|
||||
|--------|---------|----------|
|
||||
| Trace sampling | 10% | 100% |
|
||||
| Log level | info | debug |
|
||||
| Retention | 30 days | 180 days |
|
||||
| Attributes | Standard | Extended |
|
||||
| Export frequency | 1 minute | 10 seconds |
|
||||
|
||||
### 7.3 Automatic Triggers
|
||||
|
||||
- Orchestrator incident escalation
|
||||
- Policy violation threshold exceeded
|
||||
- Circuit breaker activation
|
||||
- Manual operator trigger
|
||||
|
||||
---
|
||||
|
||||
## 8. Implementation Strategy
|
||||
|
||||
### 8.1 Phase 1: Core Telemetry (Complete)
|
||||
|
||||
- [x] OpenTelemetry SDK integration
|
||||
- [x] Metrics exporter (Prometheus)
|
||||
- [x] Trace exporter (Tempo/Jaeger)
|
||||
- [x] Log exporter (Loki)
|
||||
|
||||
### 8.2 Phase 2: Advanced Features (Complete)
|
||||
|
||||
- [x] Tail sampling configuration
|
||||
- [x] Attribute redaction
|
||||
- [x] Profile-based configuration
|
||||
- [x] Dashboard provisioning
|
||||
|
||||
### 8.3 Phase 3: Forensic & Offline (In Progress)
|
||||
|
||||
- [x] Forensic mode toggle
|
||||
- [ ] Forensic bundle export (TELEM-FOR-50-001)
|
||||
- [ ] Sealed-mode guards (TELEM-SEAL-51-001)
|
||||
- [ ] Offline bundle signing (TELEM-SIGN-52-001)
|
||||
|
||||
---
|
||||
|
||||
## 9. API Surface
|
||||
|
||||
### 9.1 Configuration
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/telemetry/config/profile/{name}` | GET | `telemetry:read` | Download collector config |
|
||||
| `/telemetry/config/profiles` | GET | `telemetry:read` | List profiles |
|
||||
|
||||
### 9.2 Incident Mode
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/telemetry/incidents/mode` | POST | `telemetry:admin` | Toggle forensic mode |
|
||||
| `/telemetry/incidents/status` | GET | `telemetry:read` | Current mode status |
|
||||
|
||||
### 9.3 Exports
|
||||
|
||||
| Endpoint | Method | Scope | Description |
|
||||
|----------|--------|-------|-------------|
|
||||
| `/telemetry/exports/forensic/{window}` | GET | `telemetry:export` | Stream OTLP bundle |
|
||||
|
||||
---
|
||||
|
||||
## 10. Offline Support
|
||||
|
||||
### 10.1 Bundle Structure
|
||||
|
||||
```
|
||||
telemetry-bundle/
|
||||
├── otlp/
|
||||
│ ├── metrics.pb
|
||||
│ ├── traces.pb
|
||||
│ └── logs.pb
|
||||
├── config/
|
||||
│ ├── collector.yaml
|
||||
│ └── dashboards/
|
||||
├── manifest.json
|
||||
└── signatures/
|
||||
└── manifest.sig
|
||||
```
|
||||
|
||||
### 10.2 Sealed-Mode Guards
|
||||
|
||||
```csharp
|
||||
// StellaOps.Telemetry.Core enforces IEgressPolicy
|
||||
if (sealedMode.IsActive)
|
||||
{
|
||||
// Disable non-loopback exporters
|
||||
// Emit structured warning with remediation
|
||||
// Fall back to file-based export
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 11. Dashboards & Alerts
|
||||
|
||||
### 11.1 Standard Dashboards
|
||||
|
||||
| Dashboard | Purpose | Panels |
|
||||
|-----------|---------|--------|
|
||||
| Platform Health | Overall status | Request rate, error rate, latency |
|
||||
| Scan Operations | Scanner metrics | Scan rate, duration, findings |
|
||||
| Policy Engine | Policy metrics | Evaluation rate, rule hits, verdicts |
|
||||
| Job Orchestration | Queue metrics | Queue depth, job latency, failures |
|
||||
|
||||
### 11.2 Alert Rules
|
||||
|
||||
| Alert | Condition | Severity |
|
||||
|-------|-----------|----------|
|
||||
| High Error Rate | error_rate > 5% | critical |
|
||||
| Slow Scans | p95 > 5m | warning |
|
||||
| Queue Backlog | depth > 1000 | warning |
|
||||
| Circuit Open | breaker_open = 1 | critical |
|
||||
|
||||
---
|
||||
|
||||
## 12. Security Considerations
|
||||
|
||||
### 12.1 Data Protection
|
||||
|
||||
- Sensitive attributes redacted at collection
|
||||
- Encrypted in transit (TLS)
|
||||
- Encrypted at rest (storage layer)
|
||||
- Retention policies enforced
|
||||
|
||||
### 12.2 Access Control
|
||||
|
||||
- Authority scopes for API access
|
||||
- Tenant isolation in queries
|
||||
- Audit logging for forensic access
|
||||
|
||||
---
|
||||
|
||||
## 13. Related Documentation
|
||||
|
||||
| Resource | Location |
|
||||
|----------|----------|
|
||||
| Telemetry architecture | `docs/modules/telemetry/architecture.md` |
|
||||
| Collector configuration | `docs/modules/telemetry/collector-config.md` |
|
||||
| Dashboard provisioning | `docs/modules/telemetry/dashboards.md` |
|
||||
|
||||
---
|
||||
|
||||
## 14. Sprint Mapping
|
||||
|
||||
- **Primary Sprint:** SPRINT_0180_0001_0001_telemetry_core.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0181_0001_0002_telemetry_forensic.md
|
||||
- SPRINT_0182_0001_0003_telemetry_offline.md
|
||||
|
||||
**Key Task IDs:**
|
||||
- `TELEM-CORE-40-001` - SDK integration (DONE)
|
||||
- `TELEM-DASH-41-001` - Dashboard provisioning (DONE)
|
||||
- `TELEM-FOR-50-001` - Forensic bundles (IN PROGRESS)
|
||||
- `TELEM-SEAL-51-001` - Sealed-mode guards (TODO)
|
||||
- `TELEM-SIGN-52-001` - Bundle signing (TODO)
|
||||
|
||||
---
|
||||
|
||||
## 15. Success Metrics
|
||||
|
||||
| Metric | Target |
|
||||
|--------|--------|
|
||||
| Collection overhead | < 2% CPU |
|
||||
| Trace sampling accuracy | 100% for errors |
|
||||
| Log ingestion latency | < 5 seconds |
|
||||
| Forensic activation time | < 30 seconds |
|
||||
| Bundle export time | < 5 minutes (24h data) |
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-11-29*
|
||||
@@ -157,6 +157,107 @@ These are the authoritative advisories to reference for implementation:
|
||||
- `docs/security/dpop-mtls-rollout.md` - Sender constraints
|
||||
- **Status:** Fills HIGH-priority gap - consolidates token model, scopes, multi-tenant isolation
|
||||
|
||||
### CLI Developer Experience & Command UX
|
||||
- **Canonical:** `29-Nov-2025 - CLI Developer Experience and Command UX.md`
|
||||
- **Sprint:** SPRINT_0201_0001_0001_cli_i.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_203_cli_iii.md
|
||||
- SPRINT_205_cli_v.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/cli/architecture.md` - Module architecture
|
||||
- `docs/09_API_CLI_REFERENCE.md` - Command reference
|
||||
- **Status:** Fills HIGH-priority gap - covers command surface, auth model, Buildx integration
|
||||
|
||||
### Orchestrator Event Model & Job Lifecycle
|
||||
- **Canonical:** `29-Nov-2025 - Orchestrator Event Model and Job Lifecycle.md`
|
||||
- **Sprint:** SPRINT_0151_0001_0001_orchestrator_i.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_152_orchestrator_ii.md
|
||||
- SPRINT_0152_0001_0002_orchestrator_ii.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/orchestrator/architecture.md` - Module architecture
|
||||
- **Status:** Fills HIGH-priority gap - covers job lifecycle, quota governance, replay semantics
|
||||
|
||||
### Export Center & Reporting Strategy
|
||||
- **Canonical:** `29-Nov-2025 - Export Center and Reporting Strategy.md`
|
||||
- **Sprint:** SPRINT_0160_0001_0001_export_evidence.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0161_0001_0001_evidencelocker.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/export-center/architecture.md` - Module architecture
|
||||
- **Status:** Fills MEDIUM-priority gap - covers profile system, adapters, distribution channels
|
||||
|
||||
### Runtime Posture & Observation (Zastava)
|
||||
- **Canonical:** `29-Nov-2025 - Runtime Posture and Observation with Zastava.md`
|
||||
- **Sprint:** SPRINT_0144_0001_0001_zastava_runtime_signals.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0140_0001_0001_runtime_signals.md
|
||||
- SPRINT_0143_0000_0001_signals.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/zastava/architecture.md` - Module architecture
|
||||
- **Status:** Fills MEDIUM-priority gap - covers runtime events, admission control, drift detection
|
||||
|
||||
### Notification Rules & Alerting Engine
|
||||
- **Canonical:** `29-Nov-2025 - Notification Rules and Alerting Engine.md`
|
||||
- **Sprint:** SPRINT_0170_0001_0001_notify_engine.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0171_0001_0002_notify_connectors.md
|
||||
- SPRINT_0172_0001_0003_notify_ack_tokens.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/notify/architecture.md` - Module architecture
|
||||
- **Status:** Fills MEDIUM-priority gap - covers rules engine, channels, noise control, ack tokens
|
||||
|
||||
### Graph Analytics & Dependency Insights
|
||||
- **Canonical:** `29-Nov-2025 - Graph Analytics and Dependency Insights.md`
|
||||
- **Sprint:** SPRINT_0141_0001_0001_graph_indexer.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0401_0001_0001_reachability_evidence_chain.md
|
||||
- SPRINT_0140_0001_0001_runtime_signals.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/graph/architecture.md` - Module architecture
|
||||
- **Status:** Fills MEDIUM-priority gap - covers graph model, overlays, analytics, visualization
|
||||
|
||||
### Telemetry & Observability Patterns
|
||||
- **Canonical:** `29-Nov-2025 - Telemetry and Observability Patterns.md`
|
||||
- **Sprint:** SPRINT_0180_0001_0001_telemetry_core.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0181_0001_0002_telemetry_forensic.md
|
||||
- SPRINT_0182_0001_0003_telemetry_offline.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/telemetry/architecture.md` - Module architecture
|
||||
- **Status:** Fills MEDIUM-priority gap - covers collector topology, forensic mode, offline bundles
|
||||
|
||||
### Policy Simulation & Shadow Gates
|
||||
- **Canonical:** `29-Nov-2025 - Policy Simulation and Shadow Gates.md`
|
||||
- **Sprint:** SPRINT_0185_0001_0001_policy_simulation.md (NEW)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0120_0000_0001_policy_reasoning.md
|
||||
- SPRINT_0121_0001_0001_policy_reasoning.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/policy/architecture.md` - Module architecture
|
||||
- **Status:** Fills MEDIUM-priority gap - covers shadow runs, coverage fixtures, promotion gates
|
||||
|
||||
### Findings Ledger & Immutable Audit Trail
|
||||
- **Canonical:** `29-Nov-2025 - Findings Ledger and Immutable Audit Trail.md`
|
||||
- **Sprint:** SPRINT_0186_0001_0001_record_deterministic_execution.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0120_0000_0001_policy_reasoning.md
|
||||
- SPRINT_311_docs_tasks_md_xi.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/findings-ledger/openapi/findings-ledger.v1.yaml` - OpenAPI spec
|
||||
- **Status:** Fills MEDIUM-priority gap - covers append-only events, Merkle anchoring, projections
|
||||
|
||||
### Concelier Advisory Ingestion Model
|
||||
- **Canonical:** `29-Nov-2025 - Concelier Advisory Ingestion Model.md`
|
||||
- **Sprint:** SPRINT_0115_0001_0004_concelier_iv.md (PRIMARY)
|
||||
- **Related Sprints:**
|
||||
- SPRINT_0113_0001_0002_concelier_ii.md
|
||||
- SPRINT_0114_0001_0003_concelier_iii.md
|
||||
- **Related Docs:**
|
||||
- `docs/modules/concelier/architecture.md` - Module architecture
|
||||
- `docs/modules/concelier/link-not-merge-schema.md` - LNM schema
|
||||
- **Status:** Fills MEDIUM-priority gap - covers AOC, Link-Not-Merge, connectors, deterministic exports
|
||||
|
||||
## Files Archived
|
||||
|
||||
The following files have been moved to `archived/27-Nov-2025-superseded/`:
|
||||
@@ -198,6 +299,16 @@ The following issues were fixed:
|
||||
| Mirror & Offline Kit | SPRINT_0125_0001_0001 | EXISTING |
|
||||
| Task Pack Orchestration | SPRINT_0157_0001_0001 | EXISTING |
|
||||
| Auth/AuthZ Architecture | Multiple (100, 314, 0514) | EXISTING |
|
||||
| CLI Developer Experience | SPRINT_0201_0001_0001 | NEW |
|
||||
| Orchestrator Event Model | SPRINT_0151_0001_0001 | NEW |
|
||||
| Export Center Strategy | SPRINT_0160_0001_0001 | NEW |
|
||||
| Zastava Runtime Posture | SPRINT_0144_0001_0001 | NEW |
|
||||
| Notification Rules Engine | SPRINT_0170_0001_0001 | NEW |
|
||||
| Graph Analytics | SPRINT_0141_0001_0001 | NEW |
|
||||
| Telemetry & Observability | SPRINT_0180_0001_0001 | NEW |
|
||||
| Policy Simulation | SPRINT_0185_0001_0001 | NEW |
|
||||
| Findings Ledger | SPRINT_0186_0001_0001 | NEW |
|
||||
| Concelier Ingestion | SPRINT_0115_0001_0004 | NEW |
|
||||
|
||||
## Implementation Priority
|
||||
|
||||
@@ -210,11 +321,21 @@ Based on gap analysis:
|
||||
5. **P1 - Sovereign Crypto** (Sprint 0514) - Regional compliance enablement
|
||||
6. **P1 - Evidence Bundle & Replay** (Sprint 0161, 0187) - Audit/compliance critical
|
||||
7. **P1 - Mirror & Offline Kit** (Sprint 0125, 0150) - Air-gap deployment critical
|
||||
8. **P2 - Task Pack Orchestration** (Sprint 0157, 0158) - Automation foundation
|
||||
9. **P2 - Explainability** (Sprint 0401) - UX enhancement, existing tasks
|
||||
10. **P2 - Plugin Architecture** (Multiple) - Foundational extensibility patterns
|
||||
11. **P2 - Auth/AuthZ Architecture** (Multiple) - Security consolidation
|
||||
12. **P3 - Already Implemented** - Unknowns, Graph IDs, DSSE batching
|
||||
8. **P1 - CLI Developer Experience** (Sprint 0201) - Developer UX critical
|
||||
9. **P1 - Orchestrator Event Model** (Sprint 0151) - Job lifecycle foundation
|
||||
10. **P2 - Task Pack Orchestration** (Sprint 0157, 0158) - Automation foundation
|
||||
11. **P2 - Explainability** (Sprint 0401) - UX enhancement, existing tasks
|
||||
12. **P2 - Plugin Architecture** (Multiple) - Foundational extensibility patterns
|
||||
13. **P2 - Auth/AuthZ Architecture** (Multiple) - Security consolidation
|
||||
14. **P2 - Export Center** (Sprint 0160) - Reporting flexibility
|
||||
15. **P2 - Zastava Runtime** (Sprint 0144) - Runtime observability
|
||||
16. **P2 - Notification Rules** (Sprint 0170) - Alert management
|
||||
17. **P2 - Graph Analytics** (Sprint 0141) - Dependency insights
|
||||
18. **P2 - Telemetry** (Sprint 0180) - Observability infrastructure
|
||||
19. **P2 - Policy Simulation** (Sprint 0185) - Safe policy testing
|
||||
20. **P2 - Findings Ledger** (Sprint 0186) - Audit immutability
|
||||
21. **P2 - Concelier Ingestion** (Sprint 0115) - Advisory pipeline
|
||||
22. **P3 - Already Implemented** - Unknowns, Graph IDs, DSSE batching
|
||||
|
||||
## Implementer Quick Reference
|
||||
|
||||
@@ -241,6 +362,15 @@ For each topic, the implementer should read:
|
||||
| Evidence Locker | `docs/modules/evidence-locker/*.md` | `src/EvidenceLocker/*/AGENTS.md` |
|
||||
| Mirror | `docs/modules/mirror/*.md` | `src/Mirror/*/AGENTS.md` |
|
||||
| TaskRunner | `docs/modules/taskrunner/*.md` | `src/TaskRunner/*/AGENTS.md` |
|
||||
| CLI | `docs/modules/cli/architecture.md` | `src/Cli/*/AGENTS.md` |
|
||||
| Orchestrator | `docs/modules/orchestrator/architecture.md` | `src/Orchestrator/*/AGENTS.md` |
|
||||
| Export Center | `docs/modules/export-center/architecture.md` | `src/ExportCenter/*/AGENTS.md` |
|
||||
| Zastava | `docs/modules/zastava/architecture.md` | `src/Zastava/*/AGENTS.md` |
|
||||
| Notify | `docs/modules/notify/architecture.md` | `src/Notify/*/AGENTS.md` |
|
||||
| Graph | `docs/modules/graph/architecture.md` | `src/Graph/*/AGENTS.md` |
|
||||
| Telemetry | `docs/modules/telemetry/architecture.md` | `src/Telemetry/*/AGENTS.md` |
|
||||
| Findings Ledger | `docs/modules/findings-ledger/openapi/` | `src/Findings/*/AGENTS.md` |
|
||||
| Concelier | `docs/modules/concelier/architecture.md` | `src/Concelier/*/AGENTS.md` |
|
||||
|
||||
## Topical Gaps (Advisory Needed)
|
||||
|
||||
@@ -254,12 +384,17 @@ The following topics are mentioned in CLAUDE.md or module docs but lack dedicate
|
||||
| ~~Mirror/Offline Kit Strategy~~ | HIGH | **FILLED** | `29-Nov-2025 - Mirror and Offline Kit Strategy.md` |
|
||||
| ~~Task Pack Orchestration~~ | HIGH | **FILLED** | `29-Nov-2025 - Task Pack Orchestration and Automation.md` |
|
||||
| ~~Auth/AuthZ Architecture~~ | HIGH | **FILLED** | `29-Nov-2025 - Authentication and Authorization Architecture.md` |
|
||||
| ~~CLI Developer Experience~~ | HIGH | **FILLED** | `29-Nov-2025 - CLI Developer Experience and Command UX.md` |
|
||||
| ~~Orchestrator Event Model~~ | HIGH | **FILLED** | `29-Nov-2025 - Orchestrator Event Model and Job Lifecycle.md` |
|
||||
| ~~Export Center Strategy~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Export Center and Reporting Strategy.md` |
|
||||
| ~~Runtime Posture & Observation~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Runtime Posture and Observation with Zastava.md` |
|
||||
| ~~Notification Rules Engine~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Notification Rules and Alerting Engine.md` |
|
||||
| ~~Graph Analytics & Clustering~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Graph Analytics and Dependency Insights.md` |
|
||||
| ~~Telemetry & Observability~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Telemetry and Observability Patterns.md` |
|
||||
| ~~Policy Simulation & Shadow Gates~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Policy Simulation and Shadow Gates.md` |
|
||||
| ~~Findings Ledger & Audit Trail~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Findings Ledger and Immutable Audit Trail.md` |
|
||||
| ~~Concelier Advisory Ingestion~~ | MEDIUM | **FILLED** | `29-Nov-2025 - Concelier Advisory Ingestion Model.md` |
|
||||
| **CycloneDX 1.6 .NET Integration** | LOW | Open | Deep Architecture covers generically; expand with .NET-specific guidance |
|
||||
| **Findings Ledger & Audit Trail** | MEDIUM | Open | Immutable verdict tracking; module exists but no advisory |
|
||||
| **Runtime Posture & Observation** | MEDIUM | Open | Zastava runtime signals; sprints exist but no advisory |
|
||||
| **Graph Analytics & Clustering** | MEDIUM | Open | Community detection, blast-radius; implementation underway |
|
||||
| **Policy Simulation & Shadow Gates** | MEDIUM | Open | Impact modeling; extensive sprints but no contract advisory |
|
||||
| **Notification Rules Engine** | MEDIUM | Open | Throttling, digests, templating; sprints active |
|
||||
|
||||
## Known Issues (Non-Blocking)
|
||||
|
||||
@@ -274,4 +409,4 @@ Several filenames use en-dash (U+2011) instead of regular hyphen (-). This may c
|
||||
|
||||
---
|
||||
*Index created: 2025-11-27*
|
||||
*Last updated: 2025-11-29*
|
||||
*Last updated: 2025-11-29 (added 10 new advisories filling all identified gaps)*
|
||||
|
||||
Reference in New Issue
Block a user