feat: Add comprehensive product advisories for improved scanner functionality
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

- Introduced a blueprint for explainable quiet alerts, detailing phases for SBOM, VEX readiness, and attestations.
- Developed a roadmap for deterministic diff-aware rescans, enhancing scanner speed and efficiency.
- Implemented a hash-based SBOM layer cache to optimize container scans by reusing previous results.
- Created a multi-runtime reachability corpus to validate function-level reachability across various programming languages.
- Proposed a stable SBOM model using SPDX 3.0.1 for persistence and CycloneDX 1.6 for interchange.
- Established a validation plan for quiet scans, focusing on provenance and CI integration.
- Documented guidelines for the Findings Ledger module, outlining roles, execution rules, and testing protocols.
This commit is contained in:
master
2025-11-17 00:09:26 +02:00
parent 08b27b8a26
commit 7b01c7d6ac
73 changed files with 3993 additions and 697 deletions

View File

@@ -0,0 +1,133 @@
Heres a compact, practical way to think about **embedding intoto provenance attestations directly inside your event payloads** (instead of sidecar files), so your vuln/build graph stays temporally consistent.
---
### Why embed?
* **Atomicity:** build → publish → scan → VEX decisions share one event ID and clock; no dangling sidecars.
* **Replayability:** the event stream alone reproduces state (great for offline kits/audits).
* **Causal joins:** vulnerability findings can cite the exact provenance that led to an image/digest.
---
### Event shape (single, selfcontained envelope)
```json
{
"eventId": "01JDN2Q0YB8M…",
"eventType": "build.provenance.v1",
"occurredAt": "2025-11-13T10:22:31Z",
"subject": {
"artifactPurl": "pkg:docker/acme/api@sha256:…",
"digest": {"sha256": "…"}
},
"provenance": {
"kind": "in-toto-provenance",
"dsse": {
"payloadType": "application/vnd.in-toto+json",
"payload": "<base64(in-toto Statement)>",
"signatures": [{"keyid":"…","sig":"…"}]
},
"transparency": {
"rekor": {"logIndex": 123456, "logID": "…", "entryUUID": "…"}
}
},
"sig": {
"envelope": "dsse",
"alg": "Ed25519",
"bundle": { "certChain": ["…"], "timestamp": "…" }
},
"meta": {
"builderId": "https://builder.stella-ops.local/gha",
"buildInvocationId": "gha-run-457812",
"slsa": {"level": 3}
}
}
```
**Notes**
* `provenance.dsse.payload` holds the raw intoto Statement (Statement + Subject + Predicate).
* Keep both **artifact digest** (subject) and **statement subject** (inside payload) and verify they match on ingest.
---
### DB model (Mongo-esque)
* `events` collection: one doc per event (above schema).
* **Compound index:** `{ "subject.digest.sha256": 1, "occurredAt": 1 }`
* **Causal index:** `{ "meta.buildInvocationId": 1 }`
* **Uniq guard:** `{ "eventId": 1 } unique`
---
### Ingest pipeline (deterministic)
1. **Verify DSSE:** check signature, cert roots (or offline trust bundle).
2. **Validate Statement:** subject digests, builder ID, predicateType.
3. **Upsert artifact node:** keyed by digest; attach `lastProvenanceEventId`.
4. **Append event:** write once; never mutate (eventsourced).
5. **Emit derived edges:** `(builderId) --built--> (artifact@digest)` with `occurredAt`.
---
### Joining scans to provenance (temporal consistency)
* When a scan event arrives, resolve the **latest provenance event with `occurredAt ≤ scan.occurredAt`** for the same digest.
* Store an edge `(artifact@digest) --scannedWith--> (scanner@version)` with a **pointer to the provenance eventId** used for policy.
---
### Minimal .NET 10 contracts
```csharp
public sealed record DsseEnvelope(string PayloadType, string Payload, IReadOnlyList<DsseSig> Signatures);
public sealed record Provenance(string Kind, DsseEnvelope Dsse, Transparency? Transparency);
public sealed record EventSubject(string ArtifactPurl, Digest Digest);
public sealed record EventEnvelope(
string EventId, string EventType, DateTime OccurredAt,
EventSubject Subject, Provenance Provenance, SigMeta Sig, Meta Meta);
public interface IEventVerifier {
ValueTask VerifyAsync(EventEnvelope ev, CancellationToken ct);
}
public interface IEventIngestor {
ValueTask IngestAsync(EventEnvelope ev, CancellationToken ct); // verify->validate->append->derive
}
```
---
### Policy hooks (VEX/Trust Algebra)
* **Rule:** “Only trust findings if the scans referenced provenance has `builderId ∈ AllowedBuilders` and `SLSA ≥ 3` and `time(scan) time(prov) ≤ 24h`.”
* **Effect:** drops stale/forged results and aligns all scoring to one timeline.
---
### Migration from sidecars
1. **Dualwrite** for one sprint: keep emitting sidecars, but also embed DSSE in events.
2. Add **backfill job**: wraps historical sidecars into `build.provenance.v1` events (preserve original timestamps).
3. Flip **consumers** (scoring/VEX) to **require `provenance` in the event**; keep sidecar reader only for legacy imports.
---
### Failure & edge cases
* **Oversized payloads:** gzip the DSSE payload; cap event body (e.g., 512 KB) and store overflow in `provenance.ref` (contentaddressed blob) while **hashlinking** it in the event.
* **Multiple subjects:** keep the Statement intact; still key the event by the **primary digest** you care about, but validate all subjects.
---
### Quick checklist to ship
* [ ] Event schema & JSON schema with strict types (no additionalProperties).
* [ ] DSSE + intoto validators (offline trust bundles supported).
* [ ] Mongo indexes + appendonly writer.
* [ ] Temporal join in scanner consumer (≤ O(log n) via index).
* [ ] VEX rules referencing `event.meta` & `provenance.dsse`.
* [ ] Backfill task for legacy sidecars.
* [ ] Replay test: rebuild graph from events only → identical results.
If you want, I can turn this into readytodrop **.proto + C# models**, plus a Mongo migration script and a tiny verifier service.

View File

@@ -0,0 +1,103 @@
Heres a tight idea I think youll like: **make every VEX “nonaffected” verdict explain itself with provable, symbollevel evidence**—not just “package X isnt reachable,” but “function `Foo::bar()` (the vulnerable sink) is never called in any admissible execution of image Y,” backed by cryptographic provenance.
---
# Why this matters (quickly)
* **Trust**: Auditors and customers can verify why you suppressed a CVE.
* **Quiet scanner**: Fewer false alarms because decisions cite concrete callpaths (or their absence).
* **Moat**: Competitors stop at file/package reachability; you show **functionlevel** proof tied to intoto attestations.
---
# Core concept (plain)
Blend two things:
1. **Deterministic symbol reachability** (per language): build minimal call graphs and mark whether the vulnerable symbol is callable from your apps entrypoints.
2. **intotoanchored provenance**: sign the *inputs and reasoning* (rules, SBOM slice, callgraph hash, evidence artifacts), so the verdict can be independently reverified.
Result: each VEX decision is a **verifiable miniproof**.
---
# What the evidence looks like (per CVE/component)
* **Symbol set**: canonical IDs of vulnerable functions (e.g., `pkg@ver#Type::Method(sig)`).
* **Callgraph digest**: hash of pruned call graph from app entrypoints to those symbols.
* **Evidence**:
* Static: “No path from any entrypoint → {vuln symbols} (k=0).”
* Optional runtime: sampled traces (EventPipe/JFR/eBPF) show **0 hits** to symbols/guards.
* **Context**: build inputs (SBOM, lockfiles, compile units), framework models used, versions.
* **Attestation**: intoto/DSSE signed bundle with reproducible scan manifest.
---
# Minimal prototype this week (Scanner reachability scorer)
1. **Symbol mappers (MVP)**
* .NET: read PDB + IL to enumerate `MethodDef` symbols; map NuGet pkg → assembly → methods.
* JVM: JAR index + method table (from ASM); map Maven coords → classes → methods.
2. **Entrypoint discovery**
* Docker CMD/ENTRYPOINT → process launch → managed main(s) (ASP.NET Program.Main, Spring Boot main).
3. **Shallow callgraph** (no fancy pointsto yet):
* Direct calls + common framework handoffs (ASP.NET routing → controller; Spring @RequestMapping → handler).
4. **Vuln ↔ symbol alignment**
* Heuristics: match GHSA/OSV “affected functions” or patch diff to infer symbol names; fallback to packagescope verdict with a flag “symbolinferred: false”.
5. **Decision object**
* `ReachabilityDecision.json` with: entrypoints, symbol set, path_count, notes, hashes.
6. **Attest**
* Emit `reachability.intoto.jsonl` (subject = image digest + SBOM component + symbol digest). Cosign with your test key.
7. **VEX output**
* OpenVEX statement reason: `component_not_present` or `vulnerable_code_not_in_execute_path` with `justification_url` → small HTML report (signed).
---
# Data & schemas to add
* `Scanner.Reachability/`
* `SymbolIndex` (pkg → assemblies/classes/methods)
* `EntryPoints` (per image, normalized)
* `CallGraphPruned` (edges + hash)
* `Decision` (path_count, evidence, versions)
* `Authority`
* Key management for DSSE; policy to **require** reachability evidence for “nonaffected”.
---
# Language roadmap (fast win → harder)
* **Week 12:** .NET + JVM shallow graphs + ASP.NET/Spring models.
* **Week 34:** Node/TS (TS compiler API), Go (SSA), Python (import graph + common web frameworks).
* **Stretch:** Rust/Swift (MIR/SIL summaries), native (symbols + coarse edges), Shell (exec chain).
---
# Where to surface it (UX)
* VEX details panel: “Nonaffected (0 call paths)” → expand → show entrypoints, collapsed edge list, trace hitcounts, and “Verify attestation” button.
* CLI: `stella scan --explain CVE-XYZ --verify-attestation`.
---
# Guardrails
* If symbol mapping is uncertain, **do not** claim “nonaffected”; downgrade to “under review” with rationale.
* Cache symbol indexes per package version; keep the callgraph pruned to entrypoint cones for speed.
* Everything reproducible: the **scan manifest** (feeds, rule versions, hashes) must recreate the same decision bitforbit.
---
If you want, I can draft the exact C# namespaces, interfaces, and the OpenVEX + intoto payloads you can drop into `Scanner.Reachability` and `Authority.Attest`.

View File

@@ -0,0 +1,488 @@
Here is a complete, implementation-ready sketch you can drop into your solution and tune.
I assume:
* ASP.NET Core Web API (.NET 10).
* EF `DbContext` with `DbSet<PolCensusList>`.
* Excel via **ClosedXML** (clean API, MIT license, built on OpenXML).
---
## 1. NuGet packages
Add to the Web/API project:
```bash
dotnet add package ClosedXML
dotnet add package DocumentFormat.OpenXml
```
---
## 2. File repository abstraction
This matches your requirement: upload/download by `bucketId` + `fileId`, plus stream variants.
```csharp
public interface IFileRepository
{
// Uploads a file identified by bucketId + fileId from a Stream
Task UploadAsync(
string bucketId,
string fileId,
Stream content,
string contentType,
CancellationToken cancellationToken = default);
// Uploads a file from an in-memory buffer
Task UploadAsync(
string bucketId,
string fileId,
byte[] content,
string contentType,
CancellationToken cancellationToken = default);
// Downloads a file as a Stream (caller is responsible for disposing)
Task<Stream> DownloadAsStreamAsync(
string bucketId,
string fileId,
CancellationToken cancellationToken = default);
// Downloads a file as a byte[] buffer
Task<byte[]> DownloadAsBytesAsync(
string bucketId,
string fileId,
CancellationToken cancellationToken = default);
}
```
Example of a simple implementation over some `IFileStoreClient` (adjust to your FileStore API):
```csharp
public sealed class FileStoreRepository : IFileRepository
{
private readonly IFileStoreClient _client;
public FileStoreRepository(IFileStoreClient client)
{
_client = client;
}
public async Task UploadAsync(
string bucketId,
string fileId,
Stream content,
string contentType,
CancellationToken cancellationToken = default)
{
// Example adapt to your real client
await _client.PutObjectAsync(
bucketId: bucketId,
objectId: fileId,
content: content,
contentType: contentType,
cancellationToken: cancellationToken);
}
public async Task UploadAsync(
string bucketId,
string fileId,
byte[] content,
string contentType,
CancellationToken cancellationToken = default)
{
await using var ms = new MemoryStream(content, writable: false);
await UploadAsync(bucketId, fileId, ms, contentType, cancellationToken);
}
public async Task<Stream> DownloadAsStreamAsync(
string bucketId,
string fileId,
CancellationToken cancellationToken = default)
{
// Must return a readable Stream ready for ClosedXML
return await _client.GetObjectStreamAsync(
bucketId: bucketId,
objectId: fileId,
cancellationToken: cancellationToken);
}
public async Task<byte[]> DownloadAsBytesAsync(
string bucketId,
string fileId,
CancellationToken cancellationToken = default)
{
await using var stream = await DownloadAsStreamAsync(bucketId, fileId, cancellationToken);
using var ms = new MemoryStream();
await stream.CopyToAsync(ms, cancellationToken);
return ms.ToArray();
}
}
```
Register in DI:
```csharp
builder.Services.AddScoped<IFileRepository, FileStoreRepository>();
```
---
## 3. Import service for `PolCensusList` from Excel
To keep the controller thin, put Excel parsing + EF into a service.
Assumptions (adjust as needed):
* The file is an `.xlsx` with a header row.
* Data starts at row 2.
* Columns are:
| Column | Excel | Property |
| ------ | ----- | -------------- |
| A | 1 | CustPid |
| B | 2 | Gname |
| C | 3 | Sname |
| D | 4 | Fname |
| E | 5 | BirthDate |
| F | 6 | Gender |
| G | 7 | Bmi |
| H | 8 | Dependant |
| I | 9 | DependantOn |
| J | 10 | MemberAction |
| K | 11 | GrpCode |
| L | 12 | BeginDate |
| M | 13 | SrCustId |
| N | 14 | MemberPolicyId |
| O | 15 | MemberAnnexId |
| P | 16 | ErrMsg |
Other fields (`SrPolicyId`, `SrAnnexId`, `FileId`, `Tstamp`) are taken from parameters/system.
```csharp
using System.Globalization;
using ClosedXML.Excel;
using Microsoft.EntityFrameworkCore;
public interface IPolCensusImportService
{
Task<int> ImportFromExcelAsync(
string bucketId,
string fileId,
decimal srPolicyId,
decimal srAnnexId,
CancellationToken cancellationToken = default);
}
public sealed class PolCensusImportService : IPolCensusImportService
{
private readonly SerdicaHealthContext _dbContext;
private readonly IFileRepository _fileRepository;
public PolCensusImportService(
SerdicaHealthContext dbContext,
IFileRepository fileRepository)
{
_dbContext = dbContext;
_fileRepository = fileRepository;
}
public async Task<int> ImportFromExcelAsync(
string bucketId,
string fileId,
decimal srPolicyId,
decimal srAnnexId,
CancellationToken cancellationToken = default)
{
await using var stream = await _fileRepository.DownloadAsStreamAsync(bucketId, fileId, cancellationToken);
using var workbook = new XLWorkbook(stream);
var worksheet = workbook.Worksheets.First();
var now = DateTime.UtcNow;
var entities = new List<PolCensusList>();
const int headerRow = 1;
var firstDataRow = headerRow + 1;
for (var row = firstDataRow; ; row++)
{
var rowRange = worksheet.Row(row);
if (rowRange.IsEmpty()) break; // Stop on first fully empty row
// Minimal “empty row” check no CustPid and no Name => stop
var custPidCell = rowRange.Cell(1);
var gnameCell = rowRange.Cell(2);
var snameCell = rowRange.Cell(3);
if (custPidCell.IsEmpty() && gnameCell.IsEmpty() && snameCell.IsEmpty())
{
break;
}
var entity = new PolCensusList
{
// Non-null FK fields from parameters
SrPolicyId = srPolicyId,
SrAnnexId = srAnnexId,
CustPid = custPidCell.GetString().Trim(),
Gname = gnameCell.GetString().Trim(),
Sname = snameCell.GetString().Trim(),
Fname = rowRange.Cell(4).GetString().Trim(),
BirthDate = GetDate(rowRange.Cell(5)),
Gender = rowRange.Cell(6).GetString().Trim(),
Bmi = GetDecimal(rowRange.Cell(7)),
Dependant = rowRange.Cell(8).GetString().Trim(),
DependantOn = rowRange.Cell(9).GetString().Trim(),
MemberAction = rowRange.Cell(10).GetString().Trim(),
GrpCode = rowRange.Cell(11).GetString().Trim(),
BeginDate = GetNullableDate(rowRange.Cell(12)),
SrCustId = GetNullableDecimal(rowRange.Cell(13)),
MemberPolicyId= GetNullableDecimal(rowRange.Cell(14)),
MemberAnnexId = GetNullableDecimal(rowRange.Cell(15)),
ErrMsg = rowRange.Cell(16).GetString().Trim(),
// Audit / technical fields
Tstamp = now,
FileId = fileId,
// Attr* left null for now can be mapped later if needed
};
entities.Add(entity);
}
await using var transaction = await _dbContext.Database.BeginTransactionAsync(cancellationToken);
try
{
await _dbContext.PolCensusLists.AddRangeAsync(entities, cancellationToken);
var affected = await _dbContext.SaveChangesAsync(cancellationToken);
await transaction.CommitAsync(cancellationToken);
return affected;
}
catch
{
await transaction.RollbackAsync(cancellationToken);
throw;
}
}
private static DateTime GetDate(IXLCell cell)
{
if (cell.DataType == XLDataType.DateTime &&
cell.GetDateTime() != default)
{
return cell.GetDateTime().Date;
}
var raw = cell.GetString().Trim();
if (string.IsNullOrEmpty(raw))
throw new InvalidOperationException("BirthDate is required but empty.");
// Try a few reasonable formats extend if needed
var formats = new[]
{
"dd.MM.yyyy",
"dd/MM/yyyy",
"yyyy-MM-dd",
"M/d/yyyy",
};
if (DateTime.TryParseExact(raw, formats,
CultureInfo.InvariantCulture,
DateTimeStyles.AssumeLocal,
out var dt))
{
return dt.Date;
}
if (DateTime.TryParse(raw, CultureInfo.CurrentCulture,
DateTimeStyles.AssumeLocal, out var dt2))
{
return dt2.Date;
}
throw new FormatException($"Cannot parse date value '{raw}'.");
}
private static DateTime? GetNullableDate(IXLCell cell)
{
if (cell.IsEmpty()) return null;
if (cell.DataType == XLDataType.DateTime &&
cell.GetDateTime() != default)
{
return cell.GetDateTime().Date;
}
var raw = cell.GetString().Trim();
if (string.IsNullOrEmpty(raw)) return null;
var formats = new[]
{
"dd.MM.yyyy",
"dd/MM/yyyy",
"yyyy-MM-dd",
"M/d/yyyy",
};
if (DateTime.TryParseExact(raw, formats,
CultureInfo.InvariantCulture,
DateTimeStyles.AssumeLocal,
out var dt))
{
return dt.Date;
}
if (DateTime.TryParse(raw, CultureInfo.CurrentCulture,
DateTimeStyles.AssumeLocal, out var dt2))
{
return dt2.Date;
}
return null;
}
private static decimal GetDecimal(IXLCell cell)
{
if (cell.IsEmpty()) return 0m;
if (cell.DataType == XLDataType.Number)
{
return (decimal)cell.GetDouble();
}
var raw = cell.GetString().Trim();
if (string.IsNullOrEmpty(raw)) return 0m;
// Try invariant and current culture
if (decimal.TryParse(raw, NumberStyles.Any, CultureInfo.InvariantCulture, out var result))
return result;
if (decimal.TryParse(raw, NumberStyles.Any, CultureInfo.CurrentCulture, out result))
return result;
throw new FormatException($"Cannot parse decimal value '{raw}'.");
}
private static decimal? GetNullableDecimal(IXLCell cell)
{
if (cell.IsEmpty()) return null;
if (cell.DataType == XLDataType.Number)
{
return (decimal)cell.GetDouble();
}
var raw = cell.GetString().Trim();
if (string.IsNullOrEmpty(raw)) return null;
if (decimal.TryParse(raw, NumberStyles.Any, CultureInfo.InvariantCulture, out var result))
return result;
if (decimal.TryParse(raw, NumberStyles.Any, CultureInfo.CurrentCulture, out result))
return result;
return null;
}
}
```
Register the service:
```csharp
builder.Services.AddScoped<IPolCensusImportService, PolCensusImportService>();
```
---
## 4. API endpoint: import census data by `fileId`
Controller for IPAL / Serdica health module:
```csharp
using Microsoft.AspNetCore.Mvc;
[ApiController]
[Route("api/health/census")]
public sealed class PolCensusController : ControllerBase
{
private readonly IPolCensusImportService _importService;
public PolCensusController(IPolCensusImportService importService)
{
_importService = importService;
}
/// <summary>
/// Imports census members from an Excel file stored in FileStore.
/// </summary>
/// <param name="bucketId">Logical bucket / container in FileStore (e.g. "ipal-serdica-census").</param>
/// <param name="fileId">Unique file identifier in FileStore.</param>
/// <param name="srPolicyId">Serdica policy id (FK for PolCensusList).</param>
/// <param name="srAnnexId">Serdica annex id (FK for PolCensusList).</param>
[HttpPost("import/{bucketId}/{fileId}")]
public async Task<IActionResult> ImportFromFile(
string bucketId,
string fileId,
[FromQuery] decimal srPolicyId,
[FromQuery] decimal srAnnexId,
CancellationToken cancellationToken)
{
var importedCount = await _importService.ImportFromExcelAsync(
bucketId,
fileId,
srPolicyId,
srAnnexId,
cancellationToken);
return Ok(new
{
imported = importedCount,
fileId,
bucketId,
srPolicyId,
srAnnexId
});
}
}
```
---
## 5. Optional: upload endpoint using `IFileRepository`
If you want a simple upload entry point compatible with the repository:
```csharp
[HttpPost("upload")]
public async Task<IActionResult> UploadCensusFile(
[FromForm] IFormFile file,
[FromQuery] string bucketId,
CancellationToken cancellationToken)
{
if (file == null || file.Length == 0)
return BadRequest("File is required.");
var fileId = Guid.NewGuid().ToString("N");
await using var stream = file.OpenReadStream();
await _fileRepository.UploadAsync(
bucketId,
fileId,
stream,
file.ContentType ?? "application/vnd.openxmlformats-officedocument.spreadsheetml.sheet",
cancellationToken);
return Ok(new { fileId, bucketId });
}
```
---
If you tell me:
* the exact Excel column layout you will get from IPAL / Serdica
* whether `CensusId` is sequence-generated in Oracle or must be populated in code,
I can tighten the mapping + EF configuration so that it matches your schema 1:1 and is ready for production.

View File

@@ -0,0 +1,131 @@
Heres a compact, implementationready blueprint to make your scanners results quiet, explainable, and auditable endtoend.
# Phase the “proof spine”
1. **SBOMonly → VEXready → Attested**
* **SBOM (now):** Generate SPDX 3.0.1 + CycloneDX 1.6 for every image/module. Include purls, CPE (if available), license IDs, source URIs, and build metadata.
* **VEXready (next):** Normalize vuln inputs (OSV, GHSA, vendor feeds) to a single internal model; keep fields needed for VEX (status, justification, impact, action, timestamp, issuer).
* **Attest (then):** Emit **intoto/DSSE** attestations that bind: (a) SBOM digest, (b) ruleset version, (c) data sources & hashes, (d) VEX decisions. Log statement references in **Rekor** (or your mirror) for transparency.
# Explainability path (per alert)
For every surfaced finding, materialize:
* **Origin SBOM node** → component@version (with purl/CPE)
* **Match rule** → which matcher hit (name+version, range, CPE heuristics, source trust)
* **VEX gate** → decision with justification (e.g., affected/not_affected, component_not_present, configuration_needed)
* **Reachability trace** → static (call graph path) and/or runtime (probe hits) to the vulnerable symbol(s)
* **Deterministic score** → numeric risk built from stable inputs (below)
Expose this as a single JSON object and a short, humanreadable proof block in the UI/CLI.
# SmartDiff (incremental analysis)
* **Change detector:** hash symbols/packages and dependency graphs; on new scans, diff against prior state.
* **Selective reanalysis:** only reparse/resolve changed modules, lockfiles, or callgraph regions.
* **Memoized match & reachability:** cache vuln matches and reachability slices per (component, version, frameworkmodel) key.
# Scoring (quiet by design)
Use stable, auditable inputs:
* **Base:** CVSS v4.0 metrics (as provided by source), fall back to v3.1 if v4 missing.
* **Exploit maturity:** explicit flags when present (known exploited, PoC available, none).
* **Reachability boost/penalty:** functionlevel confirmation > packagelevel guess; runtime evidence > staticonly.
* **Compensating controls:** WAF/feature flags/sandboxing recorded as gates that reduce surfaced priority (but never erase provenance).
# Minimal data contracts (copypaste into your code)
**SBOM node (core):**
```json
{
"purl": "pkg:npm/lodash@4.17.21",
"hashes": [{"alg":"sha256","value":"..."}],
"licenses": ["MIT"],
"build": {"sourceUri":"git+https://...","commit":"..."},
"attestations": [{"type":"intoto","subjectDigest":"sha256:..."}]
}
```
**Finding proof (per alert):**
```json
{
"id": "FND-abc123",
"component": {"purl":"pkg:maven/org.example/foo@1.2.3"},
"vuln": {"id":"CVE-2024-XXXX","source":"OSV"},
"matchRule": {"name":"purl-eq","details":{"range":"[1.2.0,1.2.5)"}},
"vexGate": {"status":"affected","justification":"reachable_code_path"},
"reachability": {
"staticPath": ["Controller.handle","Service.parse","lib/vulnFunc"],
"runtimeHits": [{"symbol":"lib/vulnFunc","count":37}]
},
"score": {"base":7.1,"exploit":"poc","reach":"function","final":8.4},
"provenance": {
"sbomDigest":"sha256:...",
"ruleset":"signals-1.4.2",
"feeds":[{"name":"OSV","etag":"..."}],
"attRef":"rekor:sha256:..."
}
}
```
# Services & where they live in StellaOps
* **Sbomer**: Syftbacked generators (SPDX/CycloneDX) + DSSE signing.
* **Feedser/Concelier**: fetch & normalize vuln feeds (OSV/GHSA/vendor), maintain trust scores; “preserveprune source” rule stays.
* **Scanner.WebService**: orchestrates analyzers; run lattice algorithms here (per your standing rule).
* **Vexer/Excititor**: VEX issuance + policy evaluation (lattice gates).
* **Authority**: key management, DSSE signing, Rekor client (and mirror) endpoints.
* **Signals**: eventsourced store for proofs, reachability artifacts, and scoring outputs.
# Policies (tiny DSL sketch)
```yaml
version: 1
sources:
- id: osv
trust: 0.9
gates:
- id: not-present
when: component.present == false
action: vex(status: not_affected, reason: component_not_present)
- id: unreachable
when: reachability.static == false and reachability.runtime == false
action: vex(status: not_affected, reason: vulnerable_code_not_in_execute_path)
scoring:
base: cvss.v4 or cvss.v3
adjust:
- if: exploit.maturity in ["known_exploited","poc"]
add: 0.8
- if: reachability.function_confirmed
add: 1.1
- if: gate == "not-present"
subtract: 3.0
```
# Attestations & transparency (pragmatic path)
* **Produce** DSSEwrapped intoto statements for SBOM, ScanResult, and VEXBundle.
* **Record** statement digests in Rekor (or your **ProofMarket** mirror) with pointers back to your artifact store.
* **Bundle** offline kits with SBOM+VEX+attestations and a miniRekor log segment for airgapped audits.
# UX: onescreen truth
* Table of findings with **Final Score**, a **“Why?”** button expanding the 5part proof chain, and **Fix** suggestions.
* Global toggles: *Show only reachable*, *Mute notaffected*, *Show deltas* (SmartDiff), *Export VEX*.
# “Done next” checklist
* Wire Syft→SPDX/CycloneDX→DSSE emit → Rekor client.
* Normalize feeds to a single vuln model with trust weights.
* Implement **FindingProof** schema and persist it in Signals.
* Add **Symbolizer + perlang reachability** stubs (even minimal) to populate `reachability` fields.
* Ship VEX export (OpenVEX/CSAF) based on current gates.
* Add SmartDiff over SBOM + symbol graph hashes.
* Surface the full proof chain in UI/CLI.
If you want, I can drop in concrete .NET 10 interfaces/classes for each component and a first pass of the Rekor/DSSE helpers next.

View File

@@ -0,0 +1,102 @@
Heres a compact, plainEnglish plan to make your scanner **faster, quieter, and auditorfriendly** by (1) diffaware rescans and (2) unified binary+source reachability—both dropin for StellaOps.
# Deterministic, diffaware rescans (clean SBOM/VEX diffs)
**Goal:** Only recompute what changed; emit stable, minimal diffs reviewers can trust.
**Core ideas**
- **Perlayer SBOM artifacts (cacheable):** For each image layer `L#`, persist:
- `sbom-L#.cdx.json` (CycloneDX), `hash(L#)`, `toolchain-hash`, `feeds-hash`.
- **Symbolfingerprints** for each discovered file: `algo|path|size|mtime|xxh3|funcIDs[]`.
- **Slice recomputation:** On new image `I'`, match layers via hashes; for changed layers or files, recompute *only* their callgraph slices and vuln joins.
- **Deterministic manifests:** Every scan writes a `scan.lock.json` (inputs, feed versions, rules, lattice policy hash, tool versions, clocks) so results are **replayable**.
**Minimal data model (Mongo)**
- `scan_runs(_id, imageDigest, inputsHash, policyHash, feedsHash, startedAt, finishedAt, parentRunId?)`
- `layer_sboms(scanRunId, layerDigest, sbomCid, symbolIndexCid, layerHash)`
- `file_symbols(scanRunId, path, fileHash, funcIDs[], lang, size, mtime)`
- `diffs(fromRunId, toRunId, kind: 'sbom'|'vex'|'reachability', stats, patch)` (store JSON Patch)
**Algorithm sketch**
1. Resolve base image ancestry → map `old layer digest ↔ new layer digest`.
2. For unchanged layers: reuse `layer_sboms` + `file_symbols`.
3. For changed/added files: resymbolize + reanalyze; restrict callgraph build to **impacted SCCs**.
4. Rejoin OSV/GHSA/vendor vulns → compute reachability deltas → emit **stable JSON Patch**.
**CLI impact**
- `stella scan --deterministic --cache-dir ~/.stella/cache --emit-diff previousRunId`
- `stella diff --from <runA> --to <runB> --format jsonpatch|md`
---
# Unified binary + source reachability (functionlevel)
**Goal:** Decide “is the vulnerable function reachable/used here?” across native and managed code.
**Extraction**
- **Binary symbolizers:**
- ELF: parse `.symtab`/`.dynsym`, DWARF (if present).
- MachO/PE: export tables + DWARF/PDB (if present).
- Build **Canonical Symbol ID (CSID)**: `lang:pkg@ver!binary#file:function(signature)`; normalize C++/Rust mangling.
- **Source symbolizers:**
- .NET (Roslyn+IL), JVM (bytecode), Go (SSA), Node/TS (TS AST), Python (AST), Rust (HIR/MIR if available).
- **Bindings join:** Map FFI edges (P/Invoke, cgo, JNI/JNA, N-API) → **crossecosystem call edges**:
- `.NET P/Invoke` → DLL export CSID.
- Java JNI → `Java_com_pkg_Class_Method` ↔ native export.
- Node N-API → addon exports ↔ JS require() site.
**Reachability pipeline**
1. Build perlanguage call graphs (CG) with framework models (ASP.NET, Spring, Express, etc.).
2. Add FFI edges; merge into a **polyglot call graph**.
3. Mark **entrypoints** (container `CMD/ENTRYPOINT`, web handlers, cron, CLI verbs).
4. For each CVE → {pkg, version, affected symbols[]} map → **is any affected CSID on a path from an entrypoint?**
5. Output evidence:
- `reachable: true|false|unknown`
- shortest path (symbols list)
- probes (optional): runtime samples (EventPipe/JFR/uprobes) hitting CSIDs
**Artifacts emitted**
- `symbols.csi.jsonl` (all CSIDs)
- `polyglot.cg.slices.json` (only impacted SCCs for diffs)
- `reach.vex.json` (OpenVEX/CSAF with functionlevel notes + confidence)
---
# What to build next (lowrisk, highimpact)
- **[Week 12]** Perlayer caches + `scan.lock.json`; file symbolfingerprints (xxh3 + topK funcIDs).
- **[Week 34]** ELF/PE/MachO symbolizer lib with CSIDs; .NET IL + P/Invoke mapper.
- **[Week 56]** Polyglot CG merge + entrypoint discovery from Docker metadata; JSON Patch diffs.
- **[Week 7+]** Runtime probes (optin) to boost confidence and suppress false positives.
---
# Tiny code seeds (C# hints)
**Symbol fingerprint (per file)**
```csharp
record SymbolFingerprint(
string Algo, string Path, long Size, long MTimeUnix,
string ContentHash, string[] FuncIds);
```
**Deterministic scan lock**
```csharp
record ScanLock(
string FeedsHash, string RulesHash, string PolicyHash, string Toolchain,
string ImageDigest, string[] LayerDigests, DateTimeOffset Clock,
IDictionary<string,string> EnvPins);
```
**JSON Patch diff emit**
```csharp
var patch = JsonDiffPatch.Diff(oldVexJson, newVexJson); // stable sort keys beforehand
File.WriteAllText("vex.diff.json", patch);
```
---
If you want, I can turn this into:
- a **.proto** for the cache/index objects,
- a **Mongo schema + indexes** (including compound keys for fast layer reuse),
- and a **.NET 10** service skeleton (`StellaOps.Scanner.WebService`) with endpoints:
`/scan`, `/diff/{from}/{to}`, `/reach/{runId}`.

View File

@@ -0,0 +1,146 @@
Heres a fast, practical idea to speed up container scans: add a **hashbased SBOM layer cache** keyed by **(Docker layer digest + dependencymanifest checksum)** so identical inputs skip recomputation and only verify attestations.
---
### What this is (in plain words)
* **Layers are immutable.** Each image layer already has a content digest (e.g., `sha256:...`).
* **Dependency state is declarative.** Lockfiles/manifest files (NuGet `packages.lock.json`, `package-lock.json`, `poetry.lock`, `go.sum`, etc.) summarize deps.
* If both the **layer bytes** and the **manifest content** are identical to something weve scanned before, recomputing the SBOM/VEX is wasted work. We can **reuse** the previous result (plus a quick signature/attestation check).
---
### Cache key
```
CacheKey = SHA256(
concat(
LayerDigestCanonical, // e.g., "sha256:abcd..."
'\n',
ManifestAlgo, // e.g., "sha256"
':',
ManifestChecksum // hash of lockfile(s) inside the layer FS view
)
)
```
* Optionally include toolchain IDs to prevent crossversion skew:
* `SbomerVersion`, `ScannerRulesetVersion`, `FeedsSnapshotId` (OSV/NVD feed epoch), `PolicyBundleHash`.
---
### When it hits
* **Exact same layer + same manifests** → return cached **SBOM component graph + vuln findings + VEX** and **reverify** the **DSSE/intoto attestation** and timestamps (freshness SLA).
* **Same layer, manifests absent** → fall back to bytelevel heuristics (package index cache); lower confidence.
---
### Minimal .NET 10 sketch (StellaOps)
```csharp
public sealed record LayerInput(
string LayerDigest, // "sha256:..."
string? ManifestAlgo, // "sha256"
string? ManifestChecksum, // hex
string SbomerVersion,
string RulesetVersion,
string FeedsSnapshotId,
string PolicyBundleHash);
public static string ComputeCacheKey(LayerInput x)
{
var s = string.Join("\n", new[]{
x.LayerDigest,
x.ManifestAlgo ?? "",
x.ManifestChecksum ?? "",
x.SbomerVersion,
x.RulesetVersion,
x.FeedsSnapshotId,
x.PolicyBundleHash
});
using var sha = System.Security.Cryptography.SHA256.Create();
return Convert.ToHexString(sha.ComputeHash(System.Text.Encoding.UTF8.GetBytes(s)));
}
public sealed class SbomCacheEntry
{
public required string CacheKey { get; init; }
public required byte[] CycloneDxJson { get; init; } // gz if large
public required byte[] VexJson { get; init; }
public required byte[] AttestationDsse { get; init; } // for re-verify
public required DateTimeOffset ProducedAt { get; init; }
public required string FeedsSnapshotId { get; init; } // provenance
}
```
---
### Cache flow (Scanner)
1. **Before scan**
* Extract manifest files from the union FS of the current layer.
* Hash them (stable newline normalization).
* Build `LayerInput`; compute `CacheKey`.
* **Lookup** in `ISbomCache.Get(CacheKey)`.
2. **Hit**
* **Verify attestation** (keys/policy), **check feed epoch** still within tolerance, **resign freshness** if policy allows.
* Emit cached SBOM/VEX downstream; mark provenance as “replayed”.
3. **Miss**
* Run normal analyzers → SBOM → vuln match → VEX lattice.
* Create **intoto/DSSE attestation**.
* Store `SbomCacheEntry` and **index by**:
* `CacheKey` (primary),
* `LayerDigest` (secondary),
* `(ecosystem, manifestChecksum)` for diagnostics.
4. **Invalidation**
* Roll cache on **FeedsSnapshotId** bumps or **RulesetVersion** change.
* TTL optional for emergency revocations; keep **attestation+provenance** for audit.
---
### Storage options
* **Local**: contentaddressed dir (`/var/lib/stellaops/sbom-cache/aa/bb/<cacheKey>.cjson.gz`).
* **Remote**: Redis or Mongo (GridFS) keyed by `cacheKey`; attach indexes on `LayerDigest`, `FeedsSnapshotId`.
* **OCI artifact**: push SBOM/VEX as OCI refs tied to layer digest (helps multinode CI).
---
### Attestation verification (quick)
* On hit: `Verify(AttestationDsse, Policy)`; ensure `subject.digest == LayerDigest` and metadata (`FeedsSnapshotId`, tool versions) matches required policy.
* Optional **freshness stamp**: a tiny, fast “verification attestation” you produce at replay time.
---
### Edge cases
* **Multimanifest layers** (polyglot): combine checksums in a stable order (e.g., `SHA256(man1 + '\n' + man2 + ...)`).
* **Runtimeonly diffs** (no manifest change): include **package index snapshot hash** if you maintain one.
* **Reproducibility drift**: include analyzer version & configuration knobs in the key so the cache never masks rule changes.
---
### Why this helps
* Cold scans compute once; subsequent builds (same base image + same lockfiles) **skip minutes of work**.
* Reproducibility becomes **measurable**: cache hit ratio per repo, per base image, per feed epoch.
---
### Quick tasks to add to StellaOps
* [ ] Implement `LayerInput` + keying in `Scanner.WebService`.
* [ ] Add **Manifest Harvester** step per ecosystem (NuGet, npm, pip/poetry, go, Cargo).
* [ ] Add `ISbomCache` (local + Mongo/OCI backends) with metrics.
* [ ] Wire **attestation reverify** path on hits.
* [ ] Ship a **cache report**: hit/miss, time saved, reasons for miss (ruleset/feeds changed, manifest changed, new analyzer).
If you want, I can draft the actual C# interfaces (cache backend + verifier) and a tiny integration for your existing `Sbomer`/`Vexer` services next.

View File

@@ -0,0 +1,224 @@
Heres a compact, implementationready plan to validate functionlevel reachability with a public, minimal CVE corpus—one runnable example per runtime (Go, .NET, Python, Rust). It gives you known vulnerable symbols, a tiny app that (optionally) calls them, and captured runtime traces to prove reachability.
---
# Corpus layout
```
stellaops-reach-corpus/
README.md
tooling/
capture-dotnet-eventpipe.ps1
capture-go-trace.sh
capture-python-coverage.sh
capture-rust-probe.sh
go/
CVE-YYYY-XXXX-min/
go.mod
vulner/pkg/vuln.go // vulnerable symbol(s): func DoVuln()
app/main.go // calls or avoids DoVuln() via flag
traces/ // .out/.json from runtime
EXPECT.yaml // ground truth: reachable? call path?
dotnet/
CVE-YYYY-XXXX-min/
src/VulnLib/VulnLib.cs // [MethodImpl] public static void DoVuln()
src/App/App.csproj
src/App/Program.cs // --reach / --no-reach
traces/ // .nettrace, EventPipe JSON, stack dumps
EXPECT.yaml
python/
CVE-YYYY-XXXX-min/
vuln/__init__.py // def do_vuln()
app.py // toggle call via env
requirements.txt
traces/coverage/ // coverage.xml + callgraph.json
EXPECT.yaml
rust/
CVE-YYYY-XXXX-min/
Cargo.toml
src/lib.rs // pub fn do_vuln()
src/main.rs // feature flags: reach/no_reach
traces/ // eBPF/usdt or log-markers
EXPECT.yaml
```
---
# EXPECT.yaml (shared contract)
```yaml
id: CVE-YYYY-XXXX
ecosystem: (go|dotnet|python|rust)
packages:
- name: example.org/vulner
version: 1.0.0
symbols:
- fqname: example.org/vulner.DoVuln # or Namespace.Class.Method, module.func
kind: function
scenarios:
- name: reach
args: ["--reach"]
expected:
reachable: true
call_paths:
- ["app.main", "vulner.DoVuln"]
runtime_hits: >=1
- name: no_reach
args: ["--no-reach"]
expected:
reachable: false
call_paths: []
runtime_hits: 0
artifacts:
- sbom: sbom.cdx.json
- trace: traces/reach.trace
notes: Minimal repro; avoid network/filesystem side effects.
```
---
# Minimal vulnerable symbol patterns
**Go**
`vulner/pkg/vuln.go`
```go
package vulner
func DoVuln(input string) string { return "vuln:" + input } // marker
```
`app/main.go`
```go
package main
import (
"flag"
"example.org/vulner"
"fmt"
)
func main() {
reach := flag.Bool("reach", false, "call vuln")
flag.Parse()
if *reach { fmt.Println(vulner.DoVuln("hit")) } else { fmt.Println("skip") }
}
```
**.NET (C# / .NET 10)**
`VulnLib/VulnLib.cs`
```csharp
namespace VulnLib;
public static class V {
public static string DoVuln(string s) => "vuln:" + s; // marker
}
```
`App/Program.cs`
```csharp
using System;
using VulnLib;
var reach = args.Contains("--reach");
Console.WriteLine(reach ? V.DoVuln("hit") : "skip");
```
**Python**
`vuln/__init__.py`
```python
def do_vuln(s: str) -> str:
return "vuln:" + s # marker
```
`app.py`
```python
import os
from vuln import do_vuln
print(do_vuln("hit") if os.getenv("REACH")=="1" else "skip")
```
**Rust**
`src/lib.rs`
```rust
pub fn do_vuln(s: &str) -> String { format!("vuln:{s}") } // marker
```
`src/main.rs`
```rust
use std::env; use vuln::do_vuln;
fn main() {
let reach = env::args().any(|a| a=="--reach");
println!("{}", if reach { do_vuln("hit") } else { "skip".into() });
}
```
---
# Runtime trace capture (tiny, deterministic)
* **Go**: `-toolexec` or `GODEBUG=efence=1` not required; use `go test -run TestReach -vet=off` (optional) + `pprof` or `runtime/trace`.
* `tooling/capture-go-trace.sh`: `go test ./... -run TestNoop && go test -run TestReach -trace=traces/reach.out`
* **.NET**: EventPipe
* `dotnet-trace collect -p $PID --providers Microsoft-DotNETCore-SampleProfiler:0:5`
* Or `dotnet-monitor collect --duration 5s --process-id ... --artifact-type traces`
* **Python**: `coverage run -m app` + `coverage xml -o traces/coverage/coverage.xml`
* **Rust**: simplest is log markers + `RUST_LOG` capture; optional: `perf record -g` or USDT via `tokio-tracing` if you want call sites.
Each trace folder includes a short `trace.json` (normalized stack hits for the vulnerable symbol) produced by a tiny normalizer script you ship in `tooling/`.
---
# SBOM & groundtruth
For each example:
* Generate CycloneDX SBOM (use the languages simplest generator or a tiny script) and include component + symbol annotations (e.g., `properties` with `symbol:fqname`).
* Keep versions pinned to avoid drift.
---
# Validation runner (one command)
`tooling/validate-all.sh`:
1. Build each example twice (reach / no_reach).
2. Capture SBOM + runtime traces.
3. Emit a unified `results.json` with:
* detected symbols from your Symbolizer
* static callgraph reachability
* runtime hit count per symbol
* pass/fail vs `EXPECT.yaml`.
Exit nonzero on any mismatch → perfect for CI gates.
---
# Why this works as a public differentiator
* **Minimal & real**: one tiny, idiomatic app per runtime; clear vulnerable symbol; two scenarios.
* **Auditable**: EXPECT.yaml + traces make results falsifiable.
* **Portable**: no network, no DB; runs in Docker or GitHub Actions.
* **Extensible**: add more CVEs by copying the template and swapping the “vulnerable symbol” (e.g., pathtraversal helper, unsafe deserializer stub, weak RNG wrapper).
---
# Next steps I can deliver immediately
* Bootstrap repo with the above structure.
* Add the four first examples + scripts.
* Wire a single `validate-all` CLI to produce a JUnitstyle report for your CI.
If you want, Ill generate the skeleton with readytorun code, EXPECTs, and the capture scripts tailored to your .NET 10 + Docker workflow.

View File

@@ -0,0 +1,34 @@
Heres a quick, concrete proposal to **lock in a stable SBOM model for StellaOps**: use **SPDX3.0.1** as your canonical persistence schema and **CycloneDX1.6** as the interchange “view,” bridged by a deterministic transform.
**Why this pairing**
* **SPDX3.0.1** gives you a rigorous, profilebased data model (Core/Security/AI/Build, etc.) with explicit **Relationship** semantics—ideal for longlived storage and graph queries. ([SPDX][1])
* **CycloneDX1.6** excels at exchange: widely adopted, supports **services/SaaSBOM**, **attestations (CDXA)**, **CBOM (crypto inventory)**, MLBOM, and more—perfect for producing portable BOMs for customers and regulators. ([CycloneDX][2])
**Target architecture (minimal)**
* **Persistence:** Store SBOMs as SPDX3.0.1 (JSONLD/RDF), normalized into your Mongo eventsourced graph; keep Relationship edges firstclass. ([SPDX][1])
* **Interchange:** On export, render CycloneDX1.6 (JSON/XML) including `components`, `services`, `dependencies`, `vulnerabilities`, and optional CBOM/CDXA blocks. ([SBOM Observer][3])
* **Deterministic transform:** Define a static mapping table (SPDX→CycloneDX) with sorted collections, stable UUID seeds, and normalized strings to guarantee byteforbyte reproducibility across offline sites.
**Quick win mapping examples**
* SPDX `Element` + `RelationshipType` → CycloneDX `dependencies` graph. ([SPDX][4])
* SPDX Security profile findings → CycloneDX `vulnerabilities` entries. ([SPDX][1])
* SPDX AI/Build profiles → CycloneDX MLBOM + CDXA attestations (build/provenance). ([SPDX][5])
* Crypto materials (keys/algos/policies) held in SPDX extensions or attributes → CycloneDX **CBOM** on export for policy checks (CNSA/NIST). ([CycloneDX][2])
**Governance & standards signal**
* SPDX3.0.x is actively aligned with **OMG/ISO** submissions (good longterm bet for storage). ([SPDX Lists][6])
* CycloneDX1.6 is the current, actively enhanced interchange standard used across vendors and tooling. ([GitHub][7])
If you want, Ill draft the exact fieldbyfield mapping table (SPDX profile → CycloneDX section), plus a small .NET 10 library skeleton for the deterministic exporter.
[1]: https://spdx.github.io/spdx-spec/v3.0.1/?utm_source=chatgpt.com "SPDX Specification 3.0.1"
[2]: https://cyclonedx.org/news/cyclonedx-v1.6-released/?utm_source=chatgpt.com "CycloneDX v1.6 Released, Advances Software Supply ..."
[3]: https://sbom.observer/academy/learn/topics/cyclonedx?utm_source=chatgpt.com "What is CycloneDX?"
[4]: https://spdx.github.io/spdx-spec/v3.0.1/model/Core/Vocabularies/RelationshipType/?utm_source=chatgpt.com "RelationshipType - SPDX Specification 3.0.1"
[5]: https://spdx.dev/wp-content/uploads/sites/31/2024/12/SPDX-3.0.1-1.pdf?utm_source=chatgpt.com "SPDX© Specification v3.0.1"
[6]: https://lists.spdx.org/g/Spdx-tech/topic/release_3_0_1_of_the_spdx/110308825?utm_source=chatgpt.com "Release 3.0.1 of the SPDX Specification"
[7]: https://github.com/CycloneDX/specification?utm_source=chatgpt.com "CycloneDX/specification"

View File

@@ -0,0 +1,132 @@
Heres a practical, plainEnglish game plan to validate three big StellaOps claims—quiet scans, provenance, and diffnative CI—so you (and auditors/customers) can reproduce the results endtoend.
---
# 1) “Explainably quiet by design”
**Goal:** Fewer falsealarms, with every suppression justified (reachability/VEX), and every alert deduplicated and actionable.
**What to measure**
* **Noise rate:** total findings vs. actionable (has fix/KB/CWE + reachable or policyrelevant).
* **Dedup:** identical CVE across layers/repos counted once.
* **Explainability:** % of findings with a clear path (package → symbol/function → evidence).
* **Suppression justifications:** % of suppressed items with VEX reason (not affected, configuration, environment, reachability).
**A/B test setup**
* **Repos (representative mix):** .NET (aspnet app & library), JVM (Spring), Node/TS (Nest), Python (FastAPI), Go (CLI), container base images (Alpine, Debian, Ubuntu), and a knownnoisy monorepo.
* **Modes:** `baseline=no VEX/reach`, `quiet=reach+VEX+dedup`.
* **Metrics capture:** emit JSONL per repo with counts and examples.
**Minimal harness (pseudo)**
```bash
# baseline
stella scan repo --out baseline.jsonl --no-reach --no-vex --no-dedup
# quiet
stella scan repo --out quiet.jsonl --reach --vex openvex.json --dedup
stella explain --in quiet.jsonl --evidence callgraph,eventpipe --why > explain.md
stella metrics compare baseline.jsonl quiet.jsonl > ab_summary.md
```
**Pass criteria (suggested)**
* ≥50% reduction in nonactionable alerts.
* 100% of suppressions carry VEX+reason.
* ≥90% of actionable findings link to evidence (reachable symbol or policy gate).
---
# 2) “Provenancefirst DevSecOps”
**Goal:** Ship a verifiable bundle anyone can check offline: SBOM + attestations + transparencylog proof.
**What to export**
* **SBOM:** CycloneDX 1.6 or SPDX 3.0.1.
* **Provenance attestation:** intoto/DSSE (builder, materials, recipe, digest).
* **Signatures:** Sigstore (cosign) or regional crypto (pluggable).
* **Transparency log receipt:** Rekor (or mirror) inclusion proof.
* **Policy snapshot:** the exact policy/lattice and feed hashes used.
* **Repro manifest:** declarative inputs so scans are replayable.
**Oneshot exporter**
```bash
stella bundle export \
--sbom cyclonedx.json \
--attest provenance.intoto.jsonl \
--sig cosign.sig \
--rekor-inclusion rekor.json \
--policy policy.yml \
--replay manifest.lock.json \
--out stella-proof-bundle.tgz
```
**Independent verification (clean machine)**
```bash
stella bundle verify stella-proof-bundle.tgz \
--check-sig --check-rekor --check-sbom --check-policy --replay
# Output should show digest matches, valid DSSE, Rekor inclusion, and replay parity.
```
**Pass criteria**
* All cryptographic checks pass offline.
* Replay produces byteidentical findings set (or a diff limited to timevarying feeds pinned by hash).
---
# 3) “Diffnative CI for containers”
**Goal:** Rescan only what changed (layers/deps/policies) with equal detection parity and lower walltime.
**Test matrix**
* **Images:** multistage app (runtime+deps), language runtimes (dotnet, jre, node, python), and a “fat” base (ubuntu:XX).
* **Changes:** Dockerfile ENV only, add/remove package, patch app DLL/JAR/JS, policy toggle.
**Runs**
```bash
# Full scan
time stella image scan myimg:old > full_old.json
time stella image scan myimg:new > full_new.json
# Diff-aware
time stella image scan myimg:new --diff-from myimg:old --cache .stella-cache > diff_new.json
stella parity check full_new.json diff_new.json > parity.md
```
**Metrics**
* **Parity:** same actionable findings IDs (allowing dedup).
* **Speedup:** (full time) / (diff time).
* **Cache hit ratio:** reused layers/components.
**Pass criteria**
* 100% actionable parity on modified images.
* ≥3× faster on typical “small change” commits; no worse than full scan when cache misses.
---
## What youll publish (deliverables)
* `VALIDATION_PLAN.md` — steps above with fixed seeds (image digests, repo SHAs).
* `harness/` — scripts to run A/B and diff tests, export bundles, and verify.
* `results/YYYYMM/` — raw JSONL, parity reports, timing tables, and a 1page summary.
* `policy/` — locked policy + feed hashes used in the runs.
---
## Nicetohave extras
* **Reachability/VEX gallery:** a few “before/after” call graphs and suppression cards.
* **Auditor mode:** `stella audit open stella-proof-bundle.tgz` → readonly UI that renders SBOM, VEX, signatures, Rekor proof, and replay log.
* **CI examples:** GitLab/GitHub YAML snippets for full vs. diff jobs with caching.
If you want, I can spit out the repoready scaffold (folders, stub scripts, sample policies) tailored to your .NET10 + Docker setup so you can run this tonight.