tests fixes and sprints work

This commit is contained in:
master
2026-01-22 19:08:46 +02:00
parent c32fff8f86
commit 726d70dc7f
881 changed files with 134434 additions and 6228 deletions

View File

@@ -39,6 +39,7 @@ Key settings:
- `subject`: sha256 (+ optional sha512) digest of the bundle target.
- `timestamps`: RFC3161/eIDAS timestamp entries with TSA chain/OCSP/CRL refs.
- `rekorProofs`: entry body/inclusion proof paths plus signed entry timestamp for offline verification.
- Inline artifacts (no `path`) are capped at 4 MiB; larger artifacts are written under `artifacts/`.
## Dependencies
@@ -55,6 +56,63 @@ Key settings:
- Mirror: `../mirror/`
- ExportCenter: `../export-center/`
## Evidence Bundles for Air-Gapped Verification
The AirGap module supports golden corpus evidence bundles for offline verification of patch provenance. These bundles enable auditors to verify security patch status without network access.
### Bundle Contents
Evidence bundles follow the OCI format and contain:
- Pre/post binaries with debug symbols
- Canonical SBOM for each binary
- DSSE delta-sig predicate proving patch status
- Build provenance (if available from buildinfo)
- RFC 3161 timestamps for each signed artifact
- Validation run results and KPIs
### Bundle Export
```bash
stella groundtruth bundle export \
--packages openssl,zlib,glibc \
--distros debian,fedora \
--output symbol-bundle.tar.gz \
--sign-with cosign
```
### Bundle Import and Verification
```bash
stella groundtruth bundle import \
--input symbol-bundle.tar.gz \
--verify-signature \
--trusted-keys /etc/stellaops/trusted-keys.pub \
--output verification-report.md
```
### Standalone Verifier
For air-gapped environments without the full Stella Ops stack, use the standalone verifier:
```bash
stella-verifier verify \
--bundle evidence-bundle.oci.tar \
--trusted-keys trusted-keys.pub \
--trust-profile eu-eidas.trustprofile.json \
--output report.json
```
Exit codes:
- `0`: All verifications passed
- `1`: One or more verifications failed
- `2`: Invalid input or configuration error
### Related Documentation
- [Golden Corpus Layout](../binary-index/golden-corpus-layout.md)
- [Golden Corpus Maintenance](../binary-index/golden-corpus-maintenance.md)
- [Golden Corpus Operations Runbook](../../runbooks/golden-corpus-operations.md)
## Current Status
Implemented with Controller for snapshot export and Importer for secure ingestion. Staleness policies enforce time-bound validity. Integrated with ExportCenter for bundle packaging and all data modules for content export/import.

View File

@@ -17,7 +17,7 @@ Stella Ops generates rich data through SBOM ingestion, vulnerability correlation
|------------|-------------|
| Unified component registry | Canonical component table with normalized suppliers and licenses |
| Vulnerability correlation | Pre-joined component-vulnerability mapping with EPSS/KEV flags |
| VEX-adjusted exposure | Vulnerability counts that respect VEX overrides |
| VEX-adjusted exposure | Vulnerability counts that respect active VEX overrides (validity windows applied) |
| Attestation tracking | Provenance and SLSA level coverage by environment/team |
| Time-series rollups | Daily snapshots for trend analysis |
| Materialized views | Pre-computed aggregations for dashboard performance |
@@ -68,6 +68,14 @@ Stella Ops generates rich data through SBOM ingestion, vulnerability correlation
| `daily_vulnerability_counts` | Rollup | Daily vuln aggregations |
| `daily_component_counts` | Rollup | Daily component aggregations |
Rollup retention is 90 days in hot storage. `compute_daily_rollups()` prunes
older rows after each run; archival follows operations runbooks.
Platform WebService can automate rollups + materialized view refreshes via
`PlatformAnalyticsMaintenanceService` (see `architecture.md` for schedule and
configuration).
Use `Platform:AnalyticsMaintenance:BackfillDays` to recompute the most recent
N days of rollups on the first maintenance run after downtime (set to `0` to disable).
### Materialized Views
| View | Refresh | Purpose |
@@ -77,33 +85,36 @@ Stella Ops generates rich data through SBOM ingestion, vulnerability correlation
| `mv_vuln_exposure` | Daily | CVE exposure adjusted by VEX |
| `mv_attestation_coverage` | Daily | Provenance/SLSA coverage by env/team |
Array-valued fields (for example `environments` and `ecosystems`) are ordered
alphabetically to keep analytics outputs deterministic.
## Quick Start
### Day-1 Queries
**Top supplier concentration (supply chain risk):**
**Top supplier concentration (supply chain risk, optional environment filter):**
```sql
SELECT * FROM analytics.sp_top_suppliers(20);
SELECT analytics.sp_top_suppliers(20, 'prod');
```
**License risk heatmap:**
**License risk heatmap (optional environment filter):**
```sql
SELECT * FROM analytics.sp_license_heatmap();
SELECT analytics.sp_license_heatmap('prod');
```
**CVE exposure adjusted by VEX:**
```sql
SELECT * FROM analytics.sp_vuln_exposure('prod', 'high');
SELECT analytics.sp_vuln_exposure('prod', 'high');
```
**Fixable vulnerability backlog:**
```sql
SELECT * FROM analytics.sp_fixable_backlog('prod');
SELECT analytics.sp_fixable_backlog('prod');
```
**Attestation coverage gaps:**
```sql
SELECT * FROM analytics.sp_attestation_gaps('prod');
SELECT analytics.sp_attestation_gaps('prod');
```
### API Endpoints
@@ -118,6 +129,82 @@ SELECT * FROM analytics.sp_attestation_gaps('prod');
| `/api/analytics/trends/vulnerabilities` | GET | Vulnerability time-series |
| `/api/analytics/trends/components` | GET | Component time-series |
All analytics endpoints require the `analytics.read` scope.
The platform metadata capability `analytics` reports whether analytics storage is configured.
#### Query Parameters
- `/api/analytics/suppliers`: `limit` (optional, default 20), `environment` (optional)
- `/api/analytics/licenses`: `environment` (optional)
- `/api/analytics/vulnerabilities`: `minSeverity` (optional, default `low`), `environment` (optional)
- `/api/analytics/backlog`: `environment` (optional)
- `/api/analytics/attestation-coverage`: `environment` (optional)
- `/api/analytics/trends/vulnerabilities`: `environment` (optional), `days` (optional, default 30)
- `/api/analytics/trends/components`: `environment` (optional), `days` (optional, default 30)
## Ingestion Configuration
Analytics ingestion runs inside the Platform WebService and subscribes to Scanner, Concelier, and Attestor streams. Configure ingestion via `Platform:AnalyticsIngestion`:
```yaml
Platform:
Storage:
PostgresConnectionString: "Host=...;Database=analytics;Username=...;Password=..."
AnalyticsIngestion:
Enabled: true
PostgresConnectionString: "" # optional; defaults to Platform:Storage
AllowedTenants: ["tenant-a", "tenant-b"]
Streams:
ScannerStream: "orchestrator:events"
ConcelierObservationStream: "concelier:advisory.observation.updated:v1"
ConcelierLinksetStream: "concelier:advisory.linkset.updated:v1"
AttestorStream: "attestor:events"
StartFromBeginning: false
Cas:
RootPath: "/var/lib/stellaops/cas"
DefaultBucket: "attestations"
Attestations:
BundleUriTemplate: "bundle:{digest}"
```
Bundle URI templates support:
- `{digest}` for the full digest string (for example `sha256:...`).
- `{hash}` for the raw hex digest (no algorithm prefix).
- `bundle:{digest}` which resolves to `cas://<DefaultBucket>/{digest}` by default.
- `file:/path/to/bundles/bundle-{hash}.json` for offline file ingestion.
For offline workflows, verify bundles with `stella bundle verify` before ingesting them.
## Console UI
SBOM Lake analytics are exposed in the Console under `Analytics > SBOM Lake` (`/analytics/sbom-lake`).
Console access requires `ui.read` plus `analytics.read` scopes.
Key UI features:
- Filters for environment, minimum severity, and time window.
- Panels for suppliers, licenses, vulnerability exposure, and attestation coverage.
- Trend views for vulnerabilities and components.
- Fixable backlog table with CSV export.
See [console.md](./console.md) for operator guidance and filter behavior.
## CLI Access
SBOM lake analytics are exposed via the CLI under `stella analytics sbom-lake`
(requires `analytics.read` scope).
```bash
# Top suppliers
stella analytics sbom-lake suppliers --limit 20
# Vulnerability exposure in prod (high+), CSV export
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high --format csv --output vuln.csv
# 30-day trends for both series
stella analytics sbom-lake trends --days 30 --series all --format json
```
See `docs/modules/cli/guides/commands/analytics.md` for command-level details.
## Architecture
See [architecture.md](./architecture.md) for detailed design decisions, data flow, and normalization rules.
@@ -133,4 +220,6 @@ See [analytics_schema.sql](../../db/analytics_schema.sql) for complete DDL inclu
## Sprint Reference
Implementation tracked in: `docs/implplan/SPRINT_20260120_030_Platform_sbom_analytics_lake.md`
Implementation tracked in:
- `docs/implplan/SPRINT_20260120_030_Platform_sbom_analytics_lake.md`
- `docs/implplan/SPRINT_20260120_032_Cli_sbom_analytics_cli.md`

View File

@@ -7,7 +7,7 @@ The Analytics module implements a **star-schema data warehouse** pattern optimiz
1. **Separation of concerns**: Analytics schema is isolated from operational schemas (scanner, vex, proof_system)
2. **Pre-computation**: Expensive aggregations computed in advance via materialized views
3. **Audit trail**: Raw payloads preserved for reprocessing and compliance
4. **Determinism**: All normalization functions are immutable and reproducible
4. **Determinism**: Normalization functions are immutable and reproducible; array aggregates are ordered for stable outputs
5. **Incremental updates**: Supports both full refresh and incremental ingestion
## Data Flow
@@ -120,10 +120,9 @@ When a component is upserted, the `VulnerabilityCorrelationService` queries Conc
2. Filter by version range matching
3. Upsert to `component_vulns` with severity, EPSS, KEV flags
**Version range matching** uses Concelier's existing logic to handle:
- Semver ranges: `>=1.0.0 <2.0.0`
- Exact versions: `1.2.3`
- Wildcards: `1.x`
**Version range matching** currently supports semver ranges and exact matches via
`VersionRuleEvaluator`. Non-semver schemes fall back to exact string matches; wildcard
and ecosystem-specific ranges require upstream normalization.
## VEX Override Logic
@@ -145,7 +144,21 @@ COUNT(DISTINCT ac.artifact_id) FILTER (
**Override validity:**
- `valid_from`: When the override became effective
- `valid_until`: Expiration (NULL = no expiration)
- Only `status = 'not_affected'` reduces exposure counts
- Only `status = 'not_affected'` reduces exposure counts, and only when the override is active in its validity window.
## Attestation Ingestion
Attestation ingestion consumes Attestor Rekor entry events and expects Sigstore bundles
or raw DSSE envelopes. The ingestion service:
- Resolves bundle URIs using `BundleUriTemplate`; `bundle:{digest}` maps to
`cas://<DefaultBucket>/{digest}` by default.
- Decodes DSSE payloads, computes `dsse_payload_hash`, and records `predicate_uri` plus
Rekor log metadata (`rekor_log_id`, `rekor_log_index`).
- Uses in-toto `subject` digests to link artifacts when reanalysis hints are absent.
- Maps predicate URIs into `analytics_attestation_type` values
(`provenance`, `sbom`, `vex`, `build`, `scan`, `policy`).
- Expands VEX statements into `vex_overrides` rows, one per product reference, and
captures optional validity timestamps when provided.
## Time-Series Rollups
@@ -164,14 +177,14 @@ Daily rollups computed by `compute_daily_rollups()`:
- `total_components`: Distinct components
- `unique_suppliers`: Distinct normalized suppliers
**Retention policy:** 90 days in hot storage; older data archived to cold storage.
**Retention policy:** 90 days in hot storage; `compute_daily_rollups()` prunes older rows and downstream jobs archive to cold storage.
## Materialized View Refresh
All materialized views support `REFRESH ... CONCURRENTLY` for zero-downtime updates:
```sql
-- Refresh all views (run daily via pg_cron or Scheduler)
-- Refresh all views (non-concurrent; run off-peak)
SELECT analytics.refresh_all_views();
```
@@ -182,6 +195,19 @@ SELECT analytics.refresh_all_views();
- `mv_attestation_coverage`: 02:45 UTC daily
- `compute_daily_rollups()`: 03:00 UTC daily
Platform WebService can run the daily rollup + refresh loop via
`PlatformAnalyticsMaintenanceService`. Configure the schedule with:
- `Platform:AnalyticsMaintenance:Enabled` (default `true`)
- `Platform:AnalyticsMaintenance:IntervalMinutes` (default `1440`)
- `Platform:AnalyticsMaintenance:RunOnStartup` (default `true`)
- `Platform:AnalyticsMaintenance:ComputeDailyRollups` (default `true`)
- `Platform:AnalyticsMaintenance:RefreshMaterializedViews` (default `true`)
- `Platform:AnalyticsMaintenance:BackfillDays` (default `0`, set to `0` to disable; recompute the most recent N days on the first maintenance run)
The hosted service issues concurrent refresh statements directly for each view.
Use a DB scheduler (pg_cron) or external orchestrator if you need the staggered
per-view timing above.
## Performance Considerations
### Indexing Strategy
@@ -198,9 +224,9 @@ SELECT analytics.refresh_all_views();
| Query | Target | Notes |
|-------|--------|-------|
| `sp_top_suppliers(20)` | < 100ms | Uses materialized view |
| `sp_license_heatmap()` | < 100ms | Uses materialized view |
| `sp_vuln_exposure()` | < 200ms | Uses materialized view |
| `sp_top_suppliers(20, 'prod')` | < 100ms | Uses materialized view when env is null; env filter reads base tables |
| `sp_license_heatmap('prod')` | < 100ms | Uses materialized view when env is null; env filter reads base tables |
| `sp_vuln_exposure()` | < 200ms | Uses materialized view for global queries; environment filters read base tables |
| `sp_fixable_backlog()` | < 500ms | Live query with indexes |
| `sp_attestation_gaps()` | < 100ms | Uses materialized view |
@@ -246,12 +272,12 @@ All tables include `created_at` and `updated_at` timestamps. Raw payload tables
### Upstream Dependencies
| Service | Event | Action |
|---------|-------|--------|
| Scanner | SBOM ingested | Normalize and upsert components |
| Concelier | Advisory updated | Re-correlate affected components |
| Excititor | VEX observation | Create/update vex_overrides |
| Attestor | Attestation created | Upsert attestation record |
| Service | Event | Contract | Action |
|---------|-------|----------|--------|
| Scanner | SBOM report ready | `scanner.event.report.ready@1` (`docs/modules/signals/events/orchestrator-scanner-events.md`) | Normalize and upsert components |
| Concelier | Advisory observation/linkset updated | `advisory.observation.updated@1` (`docs/modules/concelier/events/advisory.observation.updated@1.schema.json`), `advisory.linkset.updated@1` (`docs/modules/concelier/events/advisory.linkset.updated@1.md`) | Re-correlate affected components |
| Excititor | VEX statement changes | `vex.statement.*` (`docs/modules/excititor/architecture.md`) | Create/update vex_overrides |
| Attestor | Rekor entry logged | `rekor.entry.logged` (`docs/modules/attestor/architecture.md`) | Upsert attestation record |
### Downstream Consumers

View File

@@ -0,0 +1,64 @@
# Analytics Console (SBOM Lake)
The Console exposes SBOM analytics lake data under `Analytics > SBOM Lake`.
This view is read-only and uses the analytics API endpoints documented in `docs/modules/analytics/README.md`.
## Access
- Route: `/analytics/sbom-lake`
- Required scopes: `ui.read` and `analytics.read`
- Console admin bundles: `role/analytics-viewer`, `role/analytics-operator`, `role/analytics-admin`
- Data freshness: the page surfaces the latest `dataAsOf` timestamp returned by the API.
## Filters
The SBOM Lake page supports three filters that round-trip via URL query parameters:
- Environment: `env` (optional, example: `Prod`)
- Minimum severity: `severity` (optional, example: `high`)
- Time window (days): `days` (optional, example: `90`)
When a filter changes, the Console reloads all panels using the updated parameters.
Supplier and license panels honor the environment filter alongside the other views.
## Panels
The dashboard presents four summary panels:
1. Supplier concentration (top suppliers by component count)
2. License distribution (license categories and counts)
3. Vulnerability exposure (top CVEs after VEX adjustments)
4. Attestation coverage (provenance and SLSA 2+ coverage)
Each panel shows a loading state, empty state, and summary counts.
## Trends
Two trend panels are included:
- Vulnerability trend: net exposure over the selected time window
- Component trend: total components and unique suppliers
The Console aggregates trend points by date and renders a simple bar chart plus a compact list.
## Fixable Backlog
The fixable backlog table lists vulnerabilities with fixes available, grouped by component and service.
The "Top backlog components" table derives a component summary from the same backlog data.
### CSV Export
The "Export backlog CSV" action downloads a deterministic, ordered CSV with:
- Service
- Component
- Version
- Vulnerability
- Severity
- Environment
- Fixed version
## Troubleshooting
- If panels show "No data", verify that the analytics schema and materialized views are populated.
- If an error banner appears, check the analytics API availability and ensure the tenant has `analytics.read`.

View File

@@ -9,8 +9,8 @@ This document provides ready-to-use SQL queries for common analytics use cases.
Identifies suppliers with the highest component footprint, indicating supply chain concentration risk.
```sql
-- Via stored procedure (recommended)
SELECT * FROM analytics.sp_top_suppliers(20);
-- Via stored procedure (recommended, optional environment filter)
SELECT analytics.sp_top_suppliers(20, 'prod');
-- Direct query
SELECT
@@ -33,8 +33,8 @@ LIMIT 20;
Shows distribution of components by license category for compliance review.
```sql
-- Via stored procedure
SELECT * FROM analytics.sp_license_heatmap();
-- Via stored procedure (optional environment filter)
SELECT analytics.sp_license_heatmap('prod');
-- Direct query with grouping
SELECT
@@ -62,9 +62,9 @@ Shows true vulnerability exposure after applying VEX mitigations.
```sql
-- Via stored procedure
SELECT * FROM analytics.sp_vuln_exposure('prod', 'high');
SELECT analytics.sp_vuln_exposure('prod', 'high');
-- Direct query showing VEX effectiveness
-- Direct query showing VEX effectiveness (global view; use sp_vuln_exposure for environment filtering)
SELECT
vuln_id,
severity::TEXT,
@@ -97,7 +97,7 @@ Lists vulnerabilities that can be fixed today (fix available, not VEX-mitigated)
```sql
-- Via stored procedure
SELECT * FROM analytics.sp_fixable_backlog('prod');
SELECT analytics.sp_fixable_backlog('prod');
-- Direct query with priority scoring
SELECT
@@ -130,6 +130,7 @@ JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
AND vo.vuln_id = cv.vuln_id
AND vo.status = 'not_affected'
AND vo.valid_from <= now()
AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.affects = TRUE
AND cv.fix_available = TRUE
@@ -147,7 +148,7 @@ Shows attestation gaps by environment and team.
```sql
-- Via stored procedure
SELECT * FROM analytics.sp_attestation_gaps('prod');
SELECT analytics.sp_attestation_gaps('prod');
-- Direct query with gap analysis
SELECT
@@ -267,6 +268,7 @@ JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
AND vo.vuln_id = cv.vuln_id
AND vo.valid_from <= now()
AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.vuln_id = 'CVE-2021-44228'
ORDER BY a.environment, a.name;
@@ -312,7 +314,7 @@ SELECT
c.license_category::TEXT,
c.supplier_normalized AS supplier,
COUNT(DISTINCT a.artifact_id) AS artifact_count,
ARRAY_AGG(DISTINCT a.name) AS affected_artifacts
ARRAY_AGG(DISTINCT a.name ORDER BY a.name) AS affected_artifacts
FROM analytics.components c
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
@@ -340,6 +342,8 @@ SELECT
FROM analytics.component_vulns cv
JOIN analytics.vex_overrides vo ON vo.vuln_id = cv.vuln_id
AND vo.status = 'not_affected'
AND vo.valid_from <= now()
AND (vo.valid_until IS NULL OR vo.valid_until > now())
WHERE cv.published_at >= now() - INTERVAL '90 days'
AND cv.published_at IS NOT NULL
GROUP BY cv.severity

View File

@@ -14,7 +14,7 @@ StellaOps SBOM interoperability tests ensure compatibility with third-party secu
| SPDX | 3.0.1 | ✅ Supported | 95%+ |
Notes:
- SPDX 3.0.1 generation currently emits JSON-LD `@context`, `spdxVersion`, core document/package/relationship elements, software package/file/snippet metadata, build profile elements with output relationships, security vulnerabilities with assessment relationships, verifiedUsing hashes/signatures, and external references/identifiers. Full profile coverage is tracked in SPRINT_20260119_014.
- SPDX 3.0.1 generation currently emits JSON-LD `@context`, `spdxVersion`, core document/package/relationship elements (including agent/tool elements for creationInfo), software package/file/snippet metadata, build profile elements with output relationships, security vulnerabilities with assessment relationships, licensing license elements with declared/concluded relationships, AI AIPackage metadata (autonomy, domain, metrics, safety risk assessment), Dataset package metadata (type, collection, preprocessing, availability), verifiedUsing hashes/signatures, external references/identifiers (including externalRef contentType when available), namespaceMap/imports for cross-document references, extension metadata via SbomExtension namespace/properties on document/component/vulnerability elements, and Lite profile output (opt-in via SpdxWriterOptions.UseLiteProfile). Full profile coverage is tracked in SPRINT_20260119_014.
### Third-Party Tools

View File

@@ -29,11 +29,14 @@ Use the bundle verification flow aligned to domain operations:
```bash
stella bundle verify --bundle /path/to/bundle --offline --trust-root /path/to/tsa-root.pem --rekor-checkpoint /path/to/checkpoint.json
stella bundle verify --bundle /path/to/bundle --offline --signer /path/to/report-key.pem --signer-cert /path/to/report-cert.pem
```
Notes:
- Offline mode fails closed when revocation evidence is missing or invalid.
- Offline mode fails closed when revocation evidence is missing or invalid.
- Trust roots must be provided locally; no network fetches are allowed.
- When `--signer` is set, a DSSE report is written to `out/verification.report.json`.
- Signed report metadata includes `verifier.algo`, `verifier.cert`, `signed_at`.
## 4. Verification Behavior

View File

@@ -1239,7 +1239,183 @@ binaryindex:
---
## 10. References
## 10. Golden Corpus for Patch Provenance
> **Sprint:** SPRINT_20260121_034/035/036 - Golden Corpus Implementation
The BinaryIndex module supports a **golden corpus** of patch-paired artifacts that enables offline SBOM reproducibility and binary-level patch provenance verification.
### 10.1 Corpus Purpose
The golden corpus provides:
- **Auditor-ready evidence bundles** for air-gapped customers
- **Regression testing** for binary matching accuracy
- **Proof of patch status** independent of package metadata
### 10.2 Corpus Sources
| Source | Type | Purpose |
|--------|------|---------|
| Debian Security Tracker / DSAs | Advisory | Primary advisory linkage |
| Debian Snapshot | Binary archive | Pre/post patch binary pairs |
| Ubuntu Security Notices | Advisory | Ubuntu-specific advisories |
| Alpine secdb | Advisory | Alpine YAML advisories |
| OSV dump | Unified schema | Cross-reference and commit ranges |
### 10.2.1 Symbol Source Connectors
> **Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
The corpus ingestion layer uses pluggable connectors to retrieve symbols and metadata from upstream sources:
| Connector ID | Implementation | Protocol | Data Retrieved |
|--------------|----------------|----------|----------------|
| `debuginfod-fedora` | `DebuginfodConnector` | debuginfod HTTP | ELF debug symbols by Build-ID |
| `debuginfod-ubuntu` | `DebuginfodConnector` | debuginfod HTTP | ELF debug symbols by Build-ID |
| `ddeb-ubuntu` | `DdebConnector` | APT/HTTP | `.ddeb` debug packages |
| `buildinfo-debian` | `BuildinfoConnector` | HTTP | `.buildinfo` reproducibility records |
| `secdb-alpine` | `AlpineSecDbConnector` | Git/HTTP | `secfixes` YAML from APKBUILD |
**Connector Interface:**
```csharp
public interface ISymbolSourceConnector
{
string ConnectorId { get; }
string DisplayName { get; }
string[] SupportedDistros { get; }
Task<ConnectorStatus> GetStatusAsync(CancellationToken ct);
Task SyncAsync(SyncOptions options, CancellationToken ct);
Task<SymbolLookupResult?> LookupByBuildIdAsync(string buildId, CancellationToken ct);
Task<IAsyncEnumerable<SymbolRecord>> SearchAsync(SymbolSearchQuery query, CancellationToken ct);
}
```
**Debuginfod Connector:**
The `DebuginfodConnector` implements the [debuginfod protocol](https://sourceware.org/elfutils/Debuginfod.html) for retrieving debug symbols:
- Endpoint: `GET /buildid/<build-id>/debuginfo`
- Supports federated queries across multiple debuginfod servers
- Caches retrieved symbols in RustFS blob storage
- Rate-limited to respect upstream server policies
**Ubuntu ddeb Connector:**
The `DdebConnector` retrieves Ubuntu debug symbol packages (`.ddeb`):
- Sources: `ddebs.ubuntu.com` mirror
- Indexes: Reads `Packages.xz` for package metadata
- Extraction: Unpacks `.ddeb` AR archives to extract DWARF symbols
- Mapping: Links debug symbols to binary packages via Build-ID
**Debian Buildinfo Connector:**
The `BuildinfoConnector` retrieves Debian buildinfo files for reproducibility verification:
- Source: `buildinfos.debian.net` and snapshot archives
- Purpose: Provides build environment metadata for reproducible builds
- Fields extracted: `Build-Date`, `Build-Architecture`, `Checksums-Sha256`
- Integration: Cross-references with binary packages for provenance
**Alpine SecDB Connector:**
The `AlpineSecDbConnector` parses Alpine's security database:
- Source: `secfixes` blocks in APKBUILD files
- Repository: `alpine/aports` Git repository
- Format: YAML blocks mapping CVEs to fixed versions
- Example:
```yaml
secfixes:
3.0.11-r0:
- CVE-2024-0727
- CVE-2024-0728
```
**OSV Dump Parser:**
The `OsvDumpParser` processes Google OSV database dumps for advisory cross-correlation:
- Source: `osv.dev` bulk exports (JSON)
- Purpose: CVE → commit range extraction for patch identification
- Cross-reference: Correlates OSV entries with distribution advisories
- Inconsistency detection: Identifies discrepancies between OSV and distro advisories
```csharp
public interface IOsvDumpParser
{
IAsyncEnumerable<OsvParsedEntry> ParseDumpAsync(Stream osvDumpStream, CancellationToken ct);
OsvCveIndex BuildCveIndex(IEnumerable<OsvParsedEntry> entries);
IEnumerable<AdvisoryCorrelation> CrossReferenceWithExternal(
OsvCveIndex osvIndex,
IEnumerable<ExternalAdvisory> externalAdvisories);
IEnumerable<AdvisoryInconsistency> DetectInconsistencies(
IEnumerable<AdvisoryCorrelation> correlations);
}
```
**CLI Access:**
All connectors are manageable via the `stella groundtruth sources` CLI commands:
```bash
# List all connectors
stella groundtruth sources list
# Sync specific connector
stella groundtruth sources sync --source buildinfo-debian --full
# Enable/disable connectors
stella groundtruth sources enable ddeb-ubuntu
stella groundtruth sources disable debuginfod-fedora
```
See [Ground-Truth CLI Guide](../cli/guides/ground-truth-cli.md) for complete CLI documentation
### 10.3 Key Performance Indicators
| KPI | Target | Description |
|-----|--------|-------------|
| Per-function match rate | >= 90% | Functions matched in post-patch binary |
| False-negative patch detection | <= 5% | Patched functions incorrectly classified |
| SBOM canonical-hash stability | 3/3 | Determinism across independent runs |
| Binary reconstruction equivalence | Trend | Rebuilt binary matches original |
| End-to-end verify time (p95, cold) | Trend | Offline verification performance |
### 10.4 Validation Harness
The validation harness (`IValidationHarness`) orchestrates end-to-end verification:
```
Binary Pair (pre/post) → Symbol Recovery → IR Lifting → Fingerprinting → Matching → Metrics
```
### 10.5 Evidence Bundle Format
Evidence bundles follow OCI/ORAS conventions:
```
<pkg>-<advisory>-bundle.oci.tar
├── manifest.json # OCI manifest
└── blobs/
├── sha256:<sbom> # Canonical SBOM
├── sha256:<pre-bin> # Pre-fix binary
├── sha256:<post-bin> # Post-fix binary
├── sha256:<delta-sig> # DSSE delta-sig predicate
└── sha256:<timestamp> # RFC 3161 timestamp
```
### 10.6 Related Documentation
- [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md)
- [Golden Corpus Seed List](../../benchmarks/golden-corpus-seed-list.md)
- [Ground-Truth Corpus Specification](../../benchmarks/ground-truth-corpus.md)
---
## 11. References
- Advisory: `docs/product/advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md`
- Scanner Native Analysis: `src/Scanner/StellaOps.Scanner.Analyzers.Native/`
@@ -1248,8 +1424,9 @@ binaryindex:
- **Semantic Diffing Sprint:** `docs/implplan/SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
- **Semantic Library:** `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
- **Semantic Tests:** `src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Semantic.Tests/`
- **Golden Corpus Sprints:** `docs/implplan/SPRINT_20260121_034_BinaryIndex_golden_corpus_foundation.md`
---
*Document Version: 1.1.1*
*Last Updated: 2026-01-14*
*Document Version: 1.2.0*
*Last Updated: 2026-01-21*

View File

@@ -0,0 +1,347 @@
# Golden Corpus Folder Layout
Sprint: SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
Task: GCB-006 - Document corpus folder layout and maintenance procedures
## Overview
The golden corpus is a curated dataset of pre/post security patch binary pairs used for:
- Validating binary matching algorithms
- Benchmarking reproducibility verification
- Training machine learning models for function identification
- Generating audit-ready evidence bundles
## Root Layout
```
golden-corpus/
├── corpus/ # Security pairs organized by distro
│ ├── debian/
│ ├── ubuntu/
│ └── alpine/
├── mirrors/ # Local mirrors of upstream sources
│ ├── debian/
│ ├── ubuntu/
│ ├── alpine/
│ └── osv/
├── harness/ # Build and verification tooling
│ ├── chroots/
│ ├── lifter-matcher/
│ ├── sbom-canonicalizer/
│ └── verifier/
├── evidence/ # Generated evidence bundles
│ └── <pkg>-<advisory>-bundle.oci.tar
└── bench/ # Benchmark data and baselines
├── baselines/
└── results/
```
## Corpus Directory Structure
Each security pair follows a consistent structure:
```
corpus/<distro>/<package>/<advisory-id>/
├── pre/ # Pre-patch (vulnerable) artifacts
│ ├── src/ # Source code
│ │ ├── *.tar.gz # Original source tarball
│ │ ├── debian/ # Packaging metadata
│ │ └── buildinfo # Build reproducibility info
│ └── debs/ # Built binaries
│ ├── *.deb # Binary packages
│ ├── *.ddeb # Debug symbols
│ └── buildlog # Build log
├── post/ # Post-patch (fixed) artifacts
│ ├── src/
│ └── debs/
└── metadata/
├── advisory.json # Advisory details
├── osv.json # OSV format vulnerability
├── pair-manifest.json # Pair configuration
└── ground-truth.json # Function-level ground truth
```
### Debian Example
```
corpus/debian/openssl/DSA-5678-1/
├── pre/
│ ├── src/
│ │ ├── openssl_3.0.10.orig.tar.gz
│ │ ├── openssl_3.0.10-1.debian.tar.xz
│ │ ├── openssl_3.0.10-1.dsc
│ │ └── openssl_3.0.10-1.buildinfo
│ └── debs/
│ ├── libssl3_3.0.10-1_amd64.deb
│ ├── libssl3-dbgsym_3.0.10-1_amd64.ddeb
│ └── build.log
├── post/
│ ├── src/
│ │ ├── openssl_3.0.11.orig.tar.gz
│ │ ├── openssl_3.0.11-1.debian.tar.xz
│ │ └── ...
│ └── debs/
│ └── ...
└── metadata/
├── advisory.json
└── ground-truth.json
```
### Ubuntu Example
```
corpus/ubuntu/curl/USN-1234-1/
├── pre/
│ ├── src/
│ │ └── curl_8.4.0-1ubuntu1.tar.xz
│ └── debs/
│ └── libcurl4_8.4.0-1ubuntu1_amd64.deb
├── post/
│ └── ...
└── metadata/
├── advisory.json
└── usn.json
```
### Alpine Example
```
corpus/alpine/zlib/CVE-2022-37434/
├── pre/
│ ├── src/
│ │ └── APKBUILD
│ └── apks/
│ └── zlib-1.2.12-r2.apk
├── post/
│ └── ...
└── metadata/
└── secdb-entry.json
```
## Mirrors Directory Structure
Local mirrors cache upstream artifacts for offline operation:
```
mirrors/
├── debian/
│ ├── archive/ # snapshot.debian.org mirrors
│ │ └── pool/main/o/openssl/
│ ├── snapshot/ # Point-in-time snapshots
│ │ └── 20260101T000000Z/
│ └── buildinfo/ # buildinfos.debian.net cache
│ └── <source-name>/
├── ubuntu/
│ ├── archive/ # archive.ubuntu.com mirrors
│ ├── usn-index/ # USN metadata
│ │ └── usn-db.json
│ └── launchpad/ # Build logs from Launchpad
├── alpine/
│ ├── packages/ # Alpine package mirror
│ └── secdb/ # Security database
│ └── community.json
└── osv/
├── all.zip # Full OSV database
└── debian/ # Distro-specific extracts
```
## Harness Directory Structure
Build and verification tooling:
```
harness/
├── chroots/ # Build environments
│ ├── debian-bookworm-amd64/
│ ├── debian-bullseye-amd64/
│ ├── ubuntu-noble-amd64/
│ └── alpine-3.19-amd64/
├── lifter-matcher/ # Binary analysis tools
│ ├── ghidra/ # Ghidra installation
│ ├── bsim-server/ # BSim database server
│ └── semantic-diffing/ # Semantic diff tools
├── sbom-canonicalizer/ # SBOM normalization
│ └── config/
└── verifier/ # Standalone verifier
├── stella-verifier # Verifier binary
└── trust-profiles/ # Trust profiles
```
## Evidence Directory Structure
Generated bundles for audit/compliance:
```
evidence/
├── openssl-DSA-5678-1-bundle.oci.tar
├── curl-USN-1234-1-bundle.oci.tar
└── manifests/
└── inventory.json
```
### Bundle Internal Structure (OCI Format)
```
openssl-DSA-5678-1-bundle.oci.tar/
├── oci-layout # OCI layout version
├── index.json # OCI index with referrers
├── blobs/
│ └── sha256/
│ ├── <manifest> # Bundle manifest
│ ├── <sbom-pre> # Pre-patch SBOM
│ ├── <sbom-post> # Post-patch SBOM
│ ├── <binary-pre> # Pre-patch binary
│ ├── <binary-post> # Post-patch binary
│ ├── <delta-sig> # DSSE delta-sig predicate
│ ├── <provenance> # Build provenance
│ └── <timestamp> # RFC 3161 timestamp
└── manifest.json # Signed bundle manifest
```
## Bench Directory Structure
Benchmark data and KPI baselines:
```
bench/
├── baselines/
│ ├── current.json # Active KPI baseline
│ └── archive/ # Historical baselines
│ ├── baseline-20260115.json
│ └── baseline-20260108.json
├── results/
│ ├── 20260122120000.json # Validation run results
│ └── ...
└── reports/
└── regression-report-*.md
```
### Baseline File Format
```json
{
"baselineId": "baseline-20260122120000",
"createdAt": "2026-01-22T12:00:00Z",
"source": "abc123def456",
"description": "Post-semantic-diffing-v2 baseline",
"precision": 0.95,
"recall": 0.92,
"falseNegativeRate": 0.08,
"deterministicReplayRate": 1.0,
"ttfrpP95Ms": 150,
"additionalKpis": {}
}
```
## File Naming Conventions
| Type | Pattern | Example |
|------|---------|---------|
| Advisory ID (Debian) | `DSA-<number>-<revision>` | `DSA-5678-1` |
| Advisory ID (Ubuntu) | `USN-<number>-<revision>` | `USN-1234-1` |
| Advisory ID (Alpine) | `CVE-<year>-<number>` | `CVE-2022-37434` |
| Bundle file | `<pkg>-<advisory>-bundle.oci.tar` | `openssl-DSA-5678-1-bundle.oci.tar` |
| Baseline file | `baseline-<timestamp>.json` | `baseline-20260122120000.json` |
| Results file | `<timestamp>.json` | `20260122120000.json` |
## Metadata Files
### advisory.json
```json
{
"advisoryId": "DSA-5678-1",
"cves": ["CVE-2024-1234", "CVE-2024-5678"],
"package": "openssl",
"vulnerableVersions": ["3.0.10-1"],
"fixedVersions": ["3.0.11-1"],
"severity": "high",
"publishedAt": "2024-11-15T00:00:00Z",
"summary": "Multiple vulnerabilities in OpenSSL"
}
```
### pair-manifest.json
```json
{
"pairId": "openssl-DSA-5678-1",
"package": "openssl",
"distribution": "debian",
"suite": "bookworm",
"architecture": "amd64",
"preVersion": "3.0.10-1",
"postVersion": "3.0.11-1",
"binaries": [
"libssl3",
"libcrypto3"
],
"createdAt": "2026-01-15T10:00:00Z",
"validatedAt": "2026-01-22T12:00:00Z"
}
```
### ground-truth.json
```json
{
"pairId": "openssl-DSA-5678-1",
"binary": "libcrypto.so.3",
"functions": [
{
"name": "EVP_DigestInit_ex",
"preAddress": "0x12345",
"postAddress": "0x12347",
"status": "modified",
"confidence": 1.0
},
{
"name": "EVP_DigestUpdate",
"preAddress": "0x12400",
"postAddress": "0x12400",
"status": "unchanged",
"confidence": 1.0
}
],
"metadata": {
"generatedBy": "manual-annotation",
"reviewedBy": "security-team",
"reviewedAt": "2026-01-20T14:00:00Z"
}
}
```
## Access Patterns
### Read-Only Access
- Validation harness reads corpus pairs
- CI reads baselines for regression checks
- Auditors read evidence bundles
### Write Access
- Corpus ingestion adds new pairs
- Baseline update writes new baseline files
- Bundle export creates evidence bundles
### Sync Access
- Mirror sync updates upstream caches
- Scheduled jobs refresh OSV database
## Storage Requirements
| Component | Typical Size | Growth Rate |
|-----------|--------------|-------------|
| Corpus (per pair) | 50-500 MB | N/A |
| Mirrors (Debian) | 10-50 GB | Monthly |
| Mirrors (Ubuntu) | 5-20 GB | Monthly |
| Mirrors (Alpine) | 1-5 GB | Monthly |
| OSV Database | 500 MB | Weekly |
| Evidence bundles | 100-500 MB each | Per pair |
| Baselines | < 10 KB each | Per run |
## Related Documentation
- [Ground Truth Corpus Overview](ground-truth-corpus.md)
- [Golden Corpus Maintenance](golden-corpus-maintenance.md)
- [Corpus Ingestion Operations](corpus-ingestion-operations.md)
- [Golden Corpus Operations Runbook](../../runbooks/golden-corpus-operations.md)

View File

@@ -0,0 +1,492 @@
# Golden Corpus Maintenance
Sprint: SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
Task: GCB-006 - Document corpus folder layout and maintenance procedures
## Overview
This document describes maintenance procedures for the golden corpus, including:
- Mirror synchronization
- Baseline management
- Evidence bundle generation
- Health monitoring
## Mirror Synchronization
### Automated Sync Schedule
Mirror sync should be automated via cron jobs or CI scheduled workflows.
#### Recommended Schedule
| Mirror | Frequency | Rationale |
|--------|-----------|-----------|
| Debian archive | Daily | Security updates published daily |
| Debian buildinfo | Daily | Matches archive updates |
| Ubuntu archive | Daily | Security updates published daily |
| Ubuntu USN index | Hourly | USN metadata changes frequently |
| Alpine secdb | Daily | Less frequent updates |
| OSV database | Hourly | Aggregates multiple sources |
### Sync Scripts
#### Debian Mirror Sync
```bash
#!/bin/bash
# sync-debian-mirrors.sh
# Syncs Debian archives and buildinfo
set -euo pipefail
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
DEBIAN_MIRROR="${DEBIAN_MIRROR:-https://snapshot.debian.org}"
BUILDINFO_URL="${BUILDINFO_URL:-https://buildinfos.debian.net}"
# Packages to mirror (security-relevant)
PACKAGES=(openssl curl zlib glibc libxml2 libpng)
# Sync source packages
for pkg in "${PACKAGES[@]}"; do
echo "Syncing Debian sources for: $pkg"
# Create package directory
mkdir -p "$MIRRORS_ROOT/debian/archive/pool/main/${pkg:0:1}/$pkg"
# Download available versions
rsync -avz --progress \
"rsync://snapshot.debian.org/snapshot/debian/pool/main/${pkg:0:1}/$pkg/" \
"$MIRRORS_ROOT/debian/archive/pool/main/${pkg:0:1}/$pkg/"
done
# Sync buildinfo files
for pkg in "${PACKAGES[@]}"; do
echo "Syncing buildinfo for: $pkg"
mkdir -p "$MIRRORS_ROOT/debian/buildinfo/$pkg"
# Use wget to fetch buildinfo index and files
wget -r -np -nH --cut-dirs=2 -P "$MIRRORS_ROOT/debian/buildinfo/$pkg" \
"$BUILDINFO_URL/api/v1/buildinfo/$pkg/" || true
done
echo "Debian mirror sync complete"
date > "$MIRRORS_ROOT/debian/.last-sync"
```
#### Ubuntu Mirror Sync
```bash
#!/bin/bash
# sync-ubuntu-mirrors.sh
# Syncs Ubuntu archives and USN metadata
set -euo pipefail
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
UBUNTU_ARCHIVE="https://archive.ubuntu.com/ubuntu"
USN_API="https://ubuntu.com/security/notices.json"
# Sync USN database
echo "Syncing Ubuntu USN database..."
mkdir -p "$MIRRORS_ROOT/ubuntu/usn-index"
curl -sSL "$USN_API" -o "$MIRRORS_ROOT/ubuntu/usn-index/usn-db.json.tmp"
mv "$MIRRORS_ROOT/ubuntu/usn-index/usn-db.json.tmp" "$MIRRORS_ROOT/ubuntu/usn-index/usn-db.json"
# Sync packages (similar to Debian)
PACKAGES=(openssl curl zlib1g libxml2)
for pkg in "${PACKAGES[@]}"; do
echo "Syncing Ubuntu sources for: $pkg"
mkdir -p "$MIRRORS_ROOT/ubuntu/archive/pool/main/${pkg:0:1}/$pkg"
# ... sync logic
done
echo "Ubuntu mirror sync complete"
date > "$MIRRORS_ROOT/ubuntu/.last-sync"
```
#### Alpine SecDB Sync
```bash
#!/bin/bash
# sync-alpine-secdb.sh
# Syncs Alpine security database
set -euo pipefail
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
ALPINE_SECDB="https://secdb.alpinelinux.org"
mkdir -p "$MIRRORS_ROOT/alpine/secdb"
# Download all security databases
for branch in v3.17 v3.18 v3.19 v3.20 edge; do
for repo in main community; do
echo "Syncing Alpine secdb: $branch/$repo"
curl -sSL "$ALPINE_SECDB/$branch/$repo.json" \
-o "$MIRRORS_ROOT/alpine/secdb/${branch}-${repo}.json" || true
done
done
echo "Alpine secdb sync complete"
date > "$MIRRORS_ROOT/alpine/.last-sync"
```
#### OSV Database Sync
```bash
#!/bin/bash
# sync-osv.sh
# Syncs OSV vulnerability database
set -euo pipefail
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
OSV_URL="https://osv-vulnerabilities.storage.googleapis.com"
mkdir -p "$MIRRORS_ROOT/osv"
# Download full database
echo "Downloading OSV all.zip..."
curl -sSL "$OSV_URL/all.zip" -o "$MIRRORS_ROOT/osv/all.zip.tmp"
mv "$MIRRORS_ROOT/osv/all.zip.tmp" "$MIRRORS_ROOT/osv/all.zip"
# Extract ecosystem-specific databases
for ecosystem in Debian Ubuntu Alpine; do
mkdir -p "$MIRRORS_ROOT/osv/$ecosystem"
unzip -o -q "$MIRRORS_ROOT/osv/all.zip" "$ecosystem/*" -d "$MIRRORS_ROOT/osv/" || true
done
echo "OSV sync complete"
date > "$MIRRORS_ROOT/osv/.last-sync"
```
### Cron Configuration
```cron
# /etc/cron.d/golden-corpus-sync
# Mirror sync jobs
0 */4 * * * corpus /opt/golden-corpus/scripts/sync-debian-mirrors.sh >> /var/log/corpus/debian-sync.log 2>&1
0 */4 * * * corpus /opt/golden-corpus/scripts/sync-ubuntu-mirrors.sh >> /var/log/corpus/ubuntu-sync.log 2>&1
0 6 * * * corpus /opt/golden-corpus/scripts/sync-alpine-secdb.sh >> /var/log/corpus/alpine-sync.log 2>&1
0 * * * * corpus /opt/golden-corpus/scripts/sync-osv.sh >> /var/log/corpus/osv-sync.log 2>&1
# Health check
*/15 * * * * corpus /opt/golden-corpus/scripts/check-mirror-health.sh >> /var/log/corpus/health.log 2>&1
```
## Baseline Management
### When to Update Baselines
Update the KPI baseline when:
1. Algorithm improvements are merged (expected KPI improvement)
2. New corpus pairs are added (may change baseline metrics)
3. False positives/negatives are corrected in ground truth
4. Major version upgrades of analysis tools
### Baseline Update Procedure
#### 1. Run Full Validation
```bash
# Run validation on the full corpus
stella groundtruth validate run \
--matcher semantic-diffing \
--output bench/results/$(date +%Y%m%d%H%M%S).json \
--verbose
```
#### 2. Review Results
```bash
# Check metrics
stella groundtruth validate metrics --run-id latest
# Compare against current baseline
stella groundtruth validate check \
--results bench/results/latest.json \
--baseline bench/baselines/current.json
```
#### 3. Update Baseline
Only if regression check passes or improvements are expected:
```bash
# Archive current baseline
cp bench/baselines/current.json \
bench/baselines/archive/baseline-$(date +%Y%m%d).json
# Update baseline
stella groundtruth baseline update \
--from-results bench/results/latest.json \
--output bench/baselines/current.json \
--description "Post algorithm-v2.3 update" \
--source "$(git rev-parse HEAD)"
```
#### 4. Commit and Document
```bash
# Commit the baseline update
git add bench/baselines/
git commit -m "chore(bench): update golden corpus baseline
Reason: Algorithm v2.3 improvements
Previous baseline: baseline-20260115.json
Metrics:
- Precision: 0.95 -> 0.97 (+2pp)
- Recall: 0.92 -> 0.94 (+2pp)
- FN Rate: 0.08 -> 0.06 (-2pp)
- Determinism: 100%
- TTFRP p95: 150ms -> 140ms (-7%)"
git push
```
### Baseline Rollback
If a baseline update causes issues:
```bash
# Restore previous baseline
cp bench/baselines/archive/baseline-20260115.json \
bench/baselines/current.json
git add bench/baselines/current.json
git commit -m "revert(bench): rollback baseline to 20260115"
git push
```
## Evidence Bundle Generation
### Manual Bundle Export
```bash
# Export bundle for specific packages
stella groundtruth bundle export \
--packages openssl,curl,zlib \
--distros debian,ubuntu \
--output evidence/security-bundle-$(date +%Y%m%d).tar.gz \
--sign-with-cosign \
--include-debug \
--include-kpis \
--include-timestamps
```
### Automated Bundle Generation
Schedule bundle generation for compliance reporting:
```bash
#!/bin/bash
# generate-compliance-bundles.sh
# Run monthly for audit evidence
set -euo pipefail
EVIDENCE_DIR="/data/golden-corpus/evidence"
MONTH=$(date +%Y%m)
# Generate bundles for each distro
for distro in debian ubuntu alpine; do
stella groundtruth bundle export \
--distros "$distro" \
--packages all \
--output "$EVIDENCE_DIR/$distro-bundle-$MONTH.tar.gz" \
--sign-with-cosign \
--include-kpis \
--include-timestamps
done
# Create manifest
echo "{\"month\": \"$MONTH\", \"bundles\": [\"debian\", \"ubuntu\", \"alpine\"]}" \
> "$EVIDENCE_DIR/manifest-$MONTH.json"
```
### Bundle Verification
Always verify bundles after generation:
```bash
# Verify bundle integrity
stella groundtruth bundle import \
--input evidence/security-bundle-20260122.tar.gz \
--verify \
--trusted-keys /etc/stellaops/trusted-keys.pub \
--trust-profile /etc/stellaops/trust-profiles/global.json \
--output verification-report.md
```
## Health Monitoring
### Doctor Checks
Run Doctor checks regularly to validate corpus health:
```bash
# Run all corpus-related checks
stella doctor --check "check.binaryanalysis.corpus.*"
# Specific checks
stella doctor --check check.binaryanalysis.corpus.mirror.freshness
stella doctor --check check.binaryanalysis.corpus.kpi.baseline
stella doctor --check check.binaryanalysis.debuginfod.availability
```
### Health Check Script
```bash
#!/bin/bash
# check-mirror-health.sh
# Validates mirror freshness and connectivity
set -euo pipefail
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
STALE_THRESHOLD_DAYS=7
ALERTS=""
check_mirror() {
local mirror_name=$1
local last_sync_file=$2
local max_age=$3
if [[ ! -f "$last_sync_file" ]]; then
ALERTS+="CRITICAL: $mirror_name has never been synced\n"
return
fi
local last_sync=$(cat "$last_sync_file")
local last_sync_epoch=$(date -d "$last_sync" +%s)
local now_epoch=$(date +%s)
local age_days=$(( (now_epoch - last_sync_epoch) / 86400 ))
if [[ $age_days -gt $max_age ]]; then
ALERTS+="WARNING: $mirror_name is $age_days days old (threshold: $max_age)\n"
fi
}
# Check each mirror
check_mirror "Debian" "$MIRRORS_ROOT/debian/.last-sync" $STALE_THRESHOLD_DAYS
check_mirror "Ubuntu" "$MIRRORS_ROOT/ubuntu/.last-sync" $STALE_THRESHOLD_DAYS
check_mirror "Alpine" "$MIRRORS_ROOT/alpine/.last-sync" $STALE_THRESHOLD_DAYS
check_mirror "OSV" "$MIRRORS_ROOT/osv/.last-sync" 1 # OSV should be hourly
# Check connectivity
for url in \
"https://snapshot.debian.org" \
"https://buildinfos.debian.net" \
"https://ubuntu.com/security/notices.json" \
"https://secdb.alpinelinux.org"; do
if ! curl -sSf --connect-timeout 5 "$url" > /dev/null 2>&1; then
ALERTS+="ERROR: Cannot reach $url\n"
fi
done
# Report results
if [[ -n "$ALERTS" ]]; then
echo -e "Golden Corpus Health Issues:\n$ALERTS"
# Send alert (customize for your alerting system)
# curl -X POST -d "$ALERTS" https://alerts.example.com/webhook
exit 1
fi
echo "All mirrors healthy at $(date)"
```
### Monitoring Metrics
Export these metrics to your monitoring system:
| Metric | Description | Alert Threshold |
|--------|-------------|-----------------|
| `corpus.mirrors.age_seconds` | Time since last mirror sync | > 7 days |
| `corpus.pairs.total` | Total number of security pairs | N/A (info) |
| `corpus.validation.precision` | Latest precision rate | < baseline - 0.01 |
| `corpus.validation.recall` | Latest recall rate | < baseline - 0.01 |
| `corpus.validation.determinism` | Deterministic replay rate | < 1.0 |
| `corpus.bundle.count` | Number of evidence bundles | N/A (info) |
| `corpus.baseline.age_days` | Days since baseline update | > 30 days |
### Prometheus Metrics Example
```yaml
# prometheus-corpus-metrics.yaml
groups:
- name: golden-corpus
rules:
- alert: CorpusMirrorStale
expr: corpus_mirror_age_seconds > 604800 # 7 days
labels:
severity: warning
annotations:
summary: "Corpus mirror {{ $labels.mirror }} is stale"
- alert: CorpusRegressionDetected
expr: corpus_validation_precision < corpus_baseline_precision - 0.01
labels:
severity: critical
annotations:
summary: "Precision regression detected in golden corpus validation"
- alert: CorpusDeterminismFailure
expr: corpus_validation_determinism < 1.0
labels:
severity: critical
annotations:
summary: "Non-deterministic replay detected"
```
## Cleanup and Archival
### Archive Old Results
```bash
#!/bin/bash
# archive-old-results.sh
# Archives results older than 90 days
RESULTS_DIR="/data/golden-corpus/bench/results"
ARCHIVE_DIR="/data/golden-corpus/bench/archive"
AGE_DAYS=90
mkdir -p "$ARCHIVE_DIR"
find "$RESULTS_DIR" -name "*.json" -mtime +$AGE_DAYS -exec \
mv {} "$ARCHIVE_DIR/" \;
# Compress archived results by month
cd "$ARCHIVE_DIR"
for month in $(ls *.json | cut -c1-6 | sort -u); do
tar -czf "results-$month.tar.gz" "${month}"*.json && \
rm -f "${month}"*.json
done
```
### Prune Old Baselines
Keep only the last N baselines:
```bash
#!/bin/bash
# prune-baselines.sh
# Keeps only the 10 most recent baseline archives
BASELINE_ARCHIVE="/data/golden-corpus/bench/baselines/archive"
KEEP_COUNT=10
cd "$BASELINE_ARCHIVE"
ls -t baseline-*.json | tail -n +$((KEEP_COUNT + 1)) | xargs -r rm -f
```
## Related Documentation
- [Golden Corpus Folder Layout](golden-corpus-layout.md)
- [Ground Truth Corpus Overview](ground-truth-corpus.md)
- [Golden Corpus Operations Runbook](../../runbooks/golden-corpus-operations.md)

View File

@@ -23,10 +23,12 @@ The `stella` CLI is the operator-facing Swiss army knife for scans, exports, pol
- Versioned command docs in `docs/modules/cli/guides`.
- Plugin catalogue in `plugins/cli/**` (restart-only).
## Related resources
- ./guides/20_REFERENCE.md
- ./guides/cli-reference.md
- ./guides/policy.md
## Related resources
- ./guides/20_REFERENCE.md
- ./guides/cli-reference.md
- ./guides/commands/analytics.md
- ./guides/policy.md
- ./guides/trust-profiles.md
## Backlog references
- DOCS-CLI-OBS-52-001 / DOCS-CLI-FORENSICS-53-001 in ../../TASKS.md.

View File

@@ -51,10 +51,11 @@ Status key:
| UI capability | CLI command(s) | Status | Notes / Tasks |
|---------------|----------------|--------|---------------|
| Advisory observations search | `stella vuln observations` | ✅ Available | Implemented via `BuildVulnCommand`. |
| Advisory linkset export | `stella advisory linkset show/export` | 🟩 Planned | `CLI-LNM-22-001`. |
| VEX observations / linksets | `stella vex obs get/linkset show` | 🟩 Planned | `CLI-LNM-22-002`. |
| SBOM overlay export | `stella sbom overlay apply/export` | 🟩 Planned | Scoped to upcoming SBOM CLI sprint (`SBOM-CONSOLE-23-001/002` + CLI backlog). |
| Advisory observations search | `stella vuln observations` | ✅ Available | Implemented via `BuildVulnCommand`. |
| Advisory linkset export | `stella advisory linkset show/export` | 🟩 Planned | `CLI-LNM-22-001`. |
| VEX observations / linksets | `stella vex obs get/linkset show` | 🟩 Planned | `CLI-LNM-22-002`. |
| SBOM overlay export | `stella sbom overlay apply/export` | 🟩 Planned | Scoped to upcoming SBOM CLI sprint (`SBOM-CONSOLE-23-001/002` + CLI backlog). |
| SBOM Lake analytics (`/analytics/sbom-lake`) | `stella analytics sbom-lake <subcommand>` | ✅ Available | CLI guide at `docs/modules/cli/guides/commands/analytics.md` (SPRINT_20260120_032). |
---
@@ -151,5 +152,5 @@ The script should emit a parity report that feeds into the Downloads workspace (
---
*Last updated: 2025-10-28 (Sprint23).*
*Last updated: 2026-01-20 (Sprint 20260120).*

View File

@@ -1,5 +1,5 @@
version: 1
generated: 2025-12-01T00:00:00Z
generated: 2026-01-20T00:00:00Z
compatibility:
policy: "SemVer-like: commands/flags/exitCodes are backwards compatible within major version."
deprecation:
@@ -38,6 +38,108 @@ commands:
0: success
4: auth-misconfigured
5: token-invalid
- name: analytics
subcommands:
- name: sbom-lake
subcommands:
- name: suppliers
formats: [table, json, csv]
flags:
- name: environment
required: false
- name: limit
required: false
- name: format
required: false
values: [table, json, csv]
- name: output
required: false
exitCodes:
0: success
1: error
- name: licenses
formats: [table, json, csv]
flags:
- name: environment
required: false
- name: limit
required: false
- name: format
required: false
values: [table, json, csv]
- name: output
required: false
exitCodes:
0: success
1: error
- name: vulnerabilities
formats: [table, json, csv]
flags:
- name: environment
required: false
- name: min-severity
required: false
values: [critical, high, medium, low]
- name: limit
required: false
- name: format
required: false
values: [table, json, csv]
- name: output
required: false
exitCodes:
0: success
1: error
- name: backlog
formats: [table, json, csv]
flags:
- name: environment
required: false
- name: limit
required: false
- name: format
required: false
values: [table, json, csv]
- name: output
required: false
exitCodes:
0: success
1: error
- name: attestation-coverage
formats: [table, json, csv]
flags:
- name: environment
required: false
- name: limit
required: false
- name: format
required: false
values: [table, json, csv]
- name: output
required: false
exitCodes:
0: success
1: error
- name: trends
formats: [table, json, csv]
flags:
- name: environment
required: false
- name: days
required: false
- name: series
required: false
values: [vulnerabilities, components, all]
- name: limit
required: false
- name: format
required: false
values: [table, json, csv]
- name: output
required: false
exitCodes:
0: success
1: error
telemetry:
defaultEnabled: false
envVars:

View File

@@ -0,0 +1,47 @@
# stella analytics - Command Guide
## Commands
- `stella analytics sbom-lake suppliers [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
- `stella analytics sbom-lake licenses [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
- `stella analytics sbom-lake vulnerabilities [--environment <env>] [--min-severity <level>] [--limit <n>] [--format table|json|csv] [--output <path>]`
- `stella analytics sbom-lake backlog [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
- `stella analytics sbom-lake attestation-coverage [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
- `stella analytics sbom-lake trends [--environment <env>] [--days <n>] [--series vulnerabilities|components|all] [--limit <n>] [--format table|json|csv] [--output <path>]`
## Flags (common)
- `--format`: Output format for rendering (`table`, `json`, `csv`).
- `--output`: Write output to a file path instead of stdout.
- `--limit`: Cap the number of rows returned.
- `--environment`: Filter by environment name.
## SBOM lake notes
- Endpoints require the `analytics.read` scope.
- `--min-severity` accepts `critical`, `high`, `medium`, `low`.
- `--series` controls trend output (`vulnerabilities`, `components`, `all`).
- Tables use deterministic ordering (severity and counts first, then names).
## Examples
```bash
# Top suppliers
stella analytics sbom-lake suppliers --limit 20
# License distribution as CSV (prod)
stella analytics sbom-lake licenses --environment prod --format csv --output licenses.csv
# Vulnerability exposure in prod (high+)
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high
# Fixable backlog with table output
stella analytics sbom-lake backlog --environment prod --limit 50
# Attestation coverage in staging, JSON output
stella analytics sbom-lake attestation-coverage --environment stage --format json
# 30-day trend snapshot (both series)
stella analytics sbom-lake trends --days 30 --series all --format csv --output trends.csv
```
## Offline/verification note
- If analytics exports arrive via offline bundles, verify the bundle first with
`stella bundle verify` before importing data into downstream reports.

View File

@@ -16,6 +16,7 @@ graph TD
CLI --> EXPLAIN[Explainability]
CLI --> VEX[VEX & Decisioning]
CLI --> SBOM[SBOM Operations]
CLI --> ANALYTICS[Analytics & Insights]
CLI --> REPORT[Reporting & Export]
CLI --> OFFLINE[Offline Operations]
CLI --> SYSTEM[System & Config]
@@ -742,6 +743,601 @@ stella sbom merge --sbom <path1> --sbom <path2> [--output <path>] [--verbose]
---
## Analytics Commands
### stella analytics sbom-lake
Query SBOM lake analytics views (suppliers, licenses, vulnerabilities, backlog,
attestation coverage, trends).
**Usage:**
```bash
stella analytics sbom-lake <subcommand> [options]
```
**Subcommands:**
- `suppliers` - Supplier concentration
- `licenses` - License distribution
- `vulnerabilities` - CVE exposure (VEX-adjusted)
- `backlog` - Fixable vulnerability backlog
- `attestation-coverage` - Provenance/SLSA coverage
- `trends` - Time-series trends (vulnerabilities/components)
**Common options:**
| Option | Description |
|--------|-------------|
| `--environment <env>` | Filter to a specific environment |
| `--min-severity <level>` | Minimum severity (`critical`, `high`, `medium`, `low`) |
| `--days <n>` | Lookback window in days (trends only) |
| `--series <name>` | Trend series (`vulnerabilities`, `components`, `all`) |
| `--limit <n>` | Maximum number of rows |
| `--format <fmt>` | Output format: `table`, `json`, `csv` |
| `--output <path>` | Output file path |
**Example:**
```bash
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high --format csv --output vuln.csv
```
---
## Ground-Truth Corpus Commands
### stella groundtruth
Manage ground-truth corpus for patch-paired binary verification. The corpus supports
precision validation of security advisories by maintaining symbol and binary pairs
from upstream sources.
**Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
**Usage:**
```bash
stella groundtruth <subcommand> [options]
```
**Subcommands:**
- `sources` - Manage symbol source connectors
- `symbols` - Query and search symbols in the corpus
- `pairs` - Manage security pairs (vuln/patch binary pairs)
- `validate` - Run validation and view metrics
---
### stella groundtruth sources
Manage upstream symbol source connectors.
**Usage:**
```bash
stella groundtruth sources <command> [options]
```
**Subcommands:**
#### stella groundtruth sources list
List available symbol source connectors.
```bash
stella groundtruth sources list [--output-format table|json] [--verbose]
```
**Output:**
```
ID Display Name Status Last Sync
------------------------------------------------------------------------------------------
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
```
#### stella groundtruth sources enable
Enable a symbol source connector.
```bash
stella groundtruth sources enable <source> [--verbose]
```
**Arguments:**
- `<source>` - Source connector ID (e.g., `debuginfod-fedora`)
**Example:**
```bash
stella groundtruth sources enable debuginfod-fedora
```
#### stella groundtruth sources disable
Disable a symbol source connector.
```bash
stella groundtruth sources disable <source> [--verbose]
```
#### stella groundtruth sources sync
Synchronize symbol sources from upstream.
```bash
stella groundtruth sources sync [--source <id>] [--full] [--verbose]
```
**Options:**
| Option | Description |
|--------|-------------|
| `--source <id>` | Source connector ID (all if not specified) |
| `--full` | Perform a full sync instead of incremental |
**Example:**
```bash
# Incremental sync of all sources
stella groundtruth sources sync
# Full sync of Debian buildinfo
stella groundtruth sources sync --source buildinfo-debian --full
```
---
### stella groundtruth symbols
Query and search symbols in the corpus.
**Usage:**
```bash
stella groundtruth symbols <command> [options]
```
#### stella groundtruth symbols lookup
Lookup symbols by debug ID (build-id).
```bash
stella groundtruth symbols lookup --debug-id <id> [--output-format table|json] [--verbose]
```
**Options:**
| Option | Alias | Description | Required |
|--------|-------|-------------|----------|
| `--debug-id` | `-d` | Debug ID (build-id) to lookup | Yes |
| `--output-format` | `-O` | Output format: `table`, `json` | No |
**Example:**
```bash
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
```
**Output (table):**
```
Binary: libcrypto.so.3
Architecture: x86_64
Distribution: debian-bookworm
Package: openssl@3.0.11-1
Symbol Count: 4523
Sources: debuginfod-fedora, buildinfo-debian
```
#### stella groundtruth symbols search
Search symbols by package or distribution.
```bash
stella groundtruth symbols search [--package <name>] [--distro <distro>] [--limit <n>] [--output-format table|json] [--verbose]
```
**Options:**
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--package` | `-p` | Package name to search for | - |
| `--distro` | | Distribution filter (debian, ubuntu, alpine) | - |
| `--limit` | `-l` | Maximum results | 20 |
**Example:**
```bash
stella groundtruth symbols search --package openssl --distro debian --limit 50
```
---
### stella groundtruth pairs
Manage security pairs (vulnerable/patched binary pairs) in the corpus.
**Usage:**
```bash
stella groundtruth pairs <command> [options]
```
#### stella groundtruth pairs create
Create a new security pair.
```bash
stella groundtruth pairs create --cve <cve-id> --vuln-pkg <pkg=ver> --patch-pkg <pkg=ver> [--distro <distro>] [--verbose]
```
**Options:**
| Option | Description | Required |
|--------|-------------|----------|
| `--cve` | CVE identifier | Yes |
| `--vuln-pkg` | Vulnerable package (name=version) | Yes |
| `--patch-pkg` | Patched package (name=version) | Yes |
| `--distro` | Distribution (e.g., `debian-bookworm`) | No |
**Example:**
```bash
stella groundtruth pairs create \
--cve CVE-2024-1234 \
--vuln-pkg openssl=3.0.10-1 \
--patch-pkg openssl=3.0.11-1 \
--distro debian-bookworm
```
#### stella groundtruth pairs list
List security pairs in the corpus.
```bash
stella groundtruth pairs list [--cve <pattern>] [--package <name>] [--limit <n>] [--output-format table|json] [--verbose]
```
**Options:**
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--cve` | | Filter by CVE (supports wildcards: `CVE-2024-*`) | - |
| `--package` | `-p` | Filter by package name | - |
| `--limit` | `-l` | Maximum results | 50 |
**Example:**
```bash
stella groundtruth pairs list --cve CVE-2024-* --package openssl --limit 100
```
**Output:**
```
Pair ID CVE Package Vuln Version Patch Version
-------------------------------------------------------------------------------
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
```
#### stella groundtruth pairs delete
Delete a security pair from the corpus.
```bash
stella groundtruth pairs delete <pair-id> [--force] [--verbose]
```
**Options:**
| Option | Alias | Description |
|--------|-------|-------------|
| `--force` | `-f` | Skip confirmation prompt |
---
### stella groundtruth validate
Run validation harness against security pairs.
**Usage:**
```bash
stella groundtruth validate <command> [options]
```
#### stella groundtruth validate run
Run validation on security pairs.
```bash
stella groundtruth validate run [--pairs <pattern>] [--matcher <type>] [--output <path>] [--parallel <n>] [--verbose]
```
**Options:**
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--pairs` | `-p` | Pair filter pattern (e.g., `openssl:CVE-2024-*`) | all |
| `--matcher` | `-m` | Matcher type: `semantic-diffing`, `hash-based`, `hybrid` | `semantic-diffing` |
| `--output` | `-o` | Output file for validation report | - |
| `--parallel` | | Maximum parallel validations | 4 |
**Example:**
```bash
stella groundtruth validate run \
--pairs "openssl:CVE-2024-*" \
--matcher semantic-diffing \
--parallel 8 \
--output validation-report.md
```
**Output:**
```
Validating pairs: 10/10
Validation complete. Run ID: vr-20260122100532
Function Match Rate: 94.2%
False-Negative Rate: 2.1%
SBOM Hash Stability: 3/3
Report written to: validation-report.md
```
#### stella groundtruth validate metrics
View metrics for a validation run.
```bash
stella groundtruth validate metrics --run-id <id> [--output-format table|json] [--verbose]
```
**Options:**
| Option | Alias | Description | Required |
|--------|-------|-------------|----------|
| `--run-id` | `-r` | Validation run ID | Yes |
**Example:**
```bash
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
```
**Output (table):**
```
Run ID: vr-20260122100532
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
Pairs: 48/50 successful
Function Match Rate: 94.2%
False-Negative Rate: 2.1%
SBOM Hash Stability: 3/3
Verify Time (p50/p95): 423ms / 1.2s
```
#### stella groundtruth validate export
Export validation report.
```bash
stella groundtruth validate export --run-id <id> --output <path> [--format <fmt>] [--verbose]
```
**Options:**
| Option | Alias | Description | Default |
|--------|-------|-------------|---------|
| `--run-id` | `-r` | Validation run ID | (required) |
| `--output` | `-o` | Output file path | (required) |
| `--format` | `-f` | Export format: `markdown`, `html`, `json` | `markdown` |
**Example:**
```bash
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format markdown \
--output validation-report.md
```
**See Also:** [Ground-Truth CLI Guide](../ground-truth-cli.md)
---
### stella groundtruth bundle
Manage evidence bundles for offline verification of patch provenance.
**Sprint:** SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
**Usage:**
```bash
stella groundtruth bundle <command> [options]
```
**Subcommands:**
- `export` - Create evidence bundles for air-gapped environments
- `import` - Import and verify evidence bundles
#### stella groundtruth bundle export
Export evidence bundles containing pre/post binaries, SBOMs, delta-sig predicates, and timestamps.
```bash
stella groundtruth bundle export [options]
```
**Options:**
| Option | Description | Required |
|--------|-------------|----------|
| `--packages <list>` | Comma-separated package names (e.g., `openssl,curl`) | Yes |
| `--distros <list>` | Comma-separated distributions (e.g., `debian,ubuntu`) | Yes |
| `--output <path>` | Output bundle path (.tar.gz or .oci.tar) | Yes |
| `--sign-with <signer>` | Signing method: `cosign`, `sigstore`, `none` | No |
| `--include-debug` | Include debug symbols | No |
| `--include-kpis` | Include KPI validation results | No |
| `--include-timestamps` | Include RFC 3161 timestamps | No |
**Example:**
```bash
stella groundtruth bundle export \
--packages openssl,zlib,glibc \
--distros debian,fedora \
--output evidence/security-bundle.tar.gz \
--sign-with cosign \
--include-debug \
--include-kpis \
--include-timestamps
```
**Exit Codes:**
- `0` - Bundle created successfully
- `1` - Bundle creation failed
- `2` - Invalid input or configuration error
#### stella groundtruth bundle import
Import and verify evidence bundles in air-gapped environments.
```bash
stella groundtruth bundle import [options]
```
**Options:**
| Option | Description | Required |
|--------|-------------|----------|
| `--input <path>` | Input bundle path | Yes |
| `--verify-signature` | Verify bundle signatures | No |
| `--trusted-keys <path>` | Path to trusted public keys | No |
| `--trust-profile <path>` | Trust profile for verification | No |
| `--output <path>` | Output verification report | No |
| `--format <fmt>` | Report format: `markdown`, `json`, `html` | No |
**Example:**
```bash
stella groundtruth bundle import \
--input symbol-bundle.tar.gz \
--verify-signature \
--trusted-keys /etc/stellaops/trusted-keys.pub \
--trust-profile /etc/stellaops/trust-profiles/global.json \
--output verification-report.md
```
**Verification Steps:**
1. Validate bundle manifest signature
2. Verify all blob digests match manifest
3. Validate DSSE envelope signatures against trusted keys
4. Verify RFC 3161 timestamps against trusted TSA certificates
5. Run IR matcher to confirm patched functions
6. Verify SBOM canonical hash matches signed predicate
7. Output verification report with KPI line items
**Exit Codes:**
- `0` - All verifications passed
- `1` - One or more verifications failed
- `2` - Invalid input or configuration error
---
### stella groundtruth validate check
Check KPI regression against baseline thresholds.
**Sprint:** SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
```bash
stella groundtruth validate check [options]
```
**Options:**
| Option | Description | Default |
|--------|-------------|---------|
| `--results <path>` | Path to validation results JSON | (required) |
| `--baseline <path>` | Path to baseline JSON | (required) |
| `--precision-threshold <pp>` | Max precision drop (percentage points) | 0.01 |
| `--recall-threshold <pp>` | Max recall drop (percentage points) | 0.01 |
| `--fn-rate-threshold <pp>` | Max FN rate increase (percentage points) | 0.01 |
| `--determinism-threshold <rate>` | Min determinism rate | 1.0 |
| `--ttfrp-threshold <pct>` | Max TTFRP p95 increase (percentage) | 0.20 |
| `--output <path>` | Output report path | stdout |
| `--format <fmt>` | Report format: `markdown`, `json` | `markdown` |
**Example:**
```bash
stella groundtruth validate check \
--results bench/results/20260122.json \
--baseline bench/baselines/current.json \
--precision-threshold 0.01 \
--recall-threshold 0.01 \
--fn-rate-threshold 0.01 \
--determinism-threshold 1.0 \
--output regression-report.md
```
**Regression Gates:**
| Metric | Threshold | Action |
|--------|-----------|--------|
| Precision | Drops > threshold | Fail |
| Recall | Drops > threshold | Fail |
| False-negative rate | Increases > threshold | Fail |
| Deterministic replay | Drops below threshold | Fail |
| TTFRP p95 | Increases > threshold | Warn |
**Exit Codes:**
- `0` - All gates passed
- `1` - One or more gates failed
- `2` - Invalid input or configuration error
---
### stella groundtruth baseline
Manage KPI baselines for regression detection.
**Sprint:** SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
**Usage:**
```bash
stella groundtruth baseline <command> [options]
```
**Subcommands:**
- `update` - Update baseline from validation results
- `show` - Display baseline contents
#### stella groundtruth baseline update
Update baseline from validation results.
```bash
stella groundtruth baseline update [options]
```
**Options:**
| Option | Description | Required |
|--------|-------------|----------|
| `--from-results <path>` | Path to validation results JSON | Yes |
| `--output <path>` | Output baseline path | Yes |
| `--description <text>` | Description for the baseline update | No |
| `--source <commit>` | Source commit SHA for traceability | No |
**Example:**
```bash
stella groundtruth baseline update \
--from-results bench/results/20260122.json \
--output bench/baselines/current.json \
--description "Post algorithm-v2.3 update" \
--source "$(git rev-parse HEAD)"
```
#### stella groundtruth baseline show
Display baseline contents.
```bash
stella groundtruth baseline show --baseline <path> [--format table|json]
```
**Options:**
| Option | Description | Default |
|--------|-------------|---------|
| `--baseline <path>` | Path to baseline JSON | (required) |
| `--format` | Output format: `table`, `json` | `table` |
**Output (table):**
```
Baseline ID: baseline-20260122120000
Created: 2026-01-22T12:00:00Z
Source: abc123def456
Description: Post-semantic-diffing-v2 baseline
KPIs:
Precision: 0.9500
Recall: 0.9200
False Negative Rate: 0.0800
Determinism: 1.0000
TTFRP p95: 150ms
```
**See Also:** [Ground-Truth CLI Guide](../ground-truth-cli.md)
---
## Reporting & Export Commands
### stella report

View File

@@ -0,0 +1,351 @@
# Ground-Truth Corpus CLI Guide
**Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
## Overview
The `stella groundtruth` command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources.
## Use Cases
- **Security teams**: Validate patch presence in production binaries
- **Compliance auditors**: Generate evidence bundles for air-gapped verification
- **DevSecOps**: Integrate corpus validation into CI/CD pipelines
- **Researchers**: Query symbol databases for vulnerability analysis
## Prerequisites
- Stella CLI installed and configured
- Backend connectivity to Platform service (or offline bundle)
- For sync operations: network access to upstream sources
## Command Structure
```
stella groundtruth
├── sources # Manage symbol source connectors
│ ├── list # List available connectors
│ ├── enable # Enable a connector
│ ├── disable # Disable a connector
│ └── sync # Sync from upstream
├── symbols # Query symbols in corpus
│ ├── lookup # Lookup by debug ID
│ └── search # Search by package/distro
├── pairs # Manage security pairs
│ ├── create # Create vuln/patch pair
│ ├── list # List existing pairs
│ └── delete # Remove a pair
└── validate # Run validation harness
├── run # Execute validation
├── metrics # View run metrics
└── export # Export report
```
## Source Connectors
The ground-truth corpus ingests data from multiple upstream sources:
| Connector ID | Distribution | Data Type | Description |
|--------------|--------------|-----------|-------------|
| `debuginfod-fedora` | Fedora | Debug symbols | ELF debuginfo via debuginfod protocol |
| `debuginfod-ubuntu` | Ubuntu | Debug symbols | ELF debuginfo via debuginfod protocol |
| `ddeb-ubuntu` | Ubuntu | Debug packages | `.ddeb` debug symbol packages |
| `buildinfo-debian` | Debian | Build metadata | `.buildinfo` reproducibility records |
| `secdb-alpine` | Alpine | Security DB | `secfixes` YAML from APKBUILD |
### List Sources
```bash
stella groundtruth sources list
# Output:
ID Display Name Status Last Sync
------------------------------------------------------------------------------------------
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
```
### Enable/Disable Sources
```bash
# Enable a source connector
stella groundtruth sources enable debuginfod-fedora
# Disable a source connector (stops future syncs)
stella groundtruth sources disable debuginfod-fedora
```
### Sync Sources
```bash
# Incremental sync of all enabled sources
stella groundtruth sources sync
# Full sync of a specific source
stella groundtruth sources sync --source buildinfo-debian --full
# Sync with verbose output
stella groundtruth sources sync --source ddeb-ubuntu -v
```
## Symbol Operations
### Lookup by Debug ID
Query symbols using the ELF GNU Build-ID or equivalent identifier:
```bash
# Lookup by build-id
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a
# JSON output
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
```
**Example output:**
```
Binary: libcrypto.so.3
Architecture: x86_64
Distribution: debian-bookworm
Package: openssl@3.0.11-1
Symbol Count: 4523
Sources: debuginfod-fedora, buildinfo-debian
```
### Search Symbols
Search across the corpus by package name or distribution:
```bash
# Search by package
stella groundtruth symbols search --package openssl
# Filter by distribution
stella groundtruth symbols search --package openssl --distro debian
# Limit results
stella groundtruth symbols search --package curl --limit 100
```
## Security Pairs
Security pairs link vulnerable and patched binary versions for a specific CVE.
### Create a Pair
```bash
stella groundtruth pairs create \
--cve CVE-2024-1234 \
--vuln-pkg openssl=3.0.10-1 \
--patch-pkg openssl=3.0.11-1 \
--distro debian-bookworm
```
### List Pairs
```bash
# List all pairs
stella groundtruth pairs list
# Filter by CVE pattern
stella groundtruth pairs list --cve "CVE-2024-*"
# Filter by package
stella groundtruth pairs list --package openssl --limit 50
# JSON output
stella groundtruth pairs list --output-format json
```
**Example output:**
```
Pair ID CVE Package Vuln Version Patch Version
-------------------------------------------------------------------------------
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
```
### Delete a Pair
```bash
# Delete with confirmation prompt
stella groundtruth pairs delete pair-001
# Skip confirmation
stella groundtruth pairs delete pair-001 --force
```
## Validation Harness
The validation harness runs end-to-end verification against security pairs.
### Run Validation
```bash
# Validate all pairs
stella groundtruth validate run
# Validate specific pairs (pattern match)
stella groundtruth validate run --pairs "openssl:CVE-2024-*"
# Use specific matcher
stella groundtruth validate run --matcher semantic-diffing
# Parallel validation with report output
stella groundtruth validate run \
--pairs "curl:*" \
--parallel 8 \
--output validation-report.md
```
**Matcher types:**
| Matcher | Description |
|---------|-------------|
| `semantic-diffing` | IR-level semantic comparison (default) |
| `hash-based` | Function hash matching |
| `hybrid` | Combined semantic + hash approach |
### View Metrics
```bash
stella groundtruth validate metrics --run-id vr-20260122100532
# JSON output
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
```
**Example output:**
```
Run ID: vr-20260122100532
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
Pairs: 48/50 successful
Function Match Rate: 94.2%
False-Negative Rate: 2.1%
SBOM Hash Stability: 3/3
Verify Time (p50/p95): 423ms / 1.2s
```
### Export Reports
```bash
# Export as Markdown
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format markdown \
--output report.md
# Export as HTML
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format html \
--output report.html
# Export as JSON (machine-readable)
stella groundtruth validate export \
--run-id vr-20260122100532 \
--format json \
--output report.json
```
## CI/CD Integration
### GitHub Actions Example
```yaml
name: Corpus Validation
on:
schedule:
- cron: '0 6 * * 1' # Weekly on Monday
jobs:
validate:
runs-on: ubuntu-latest
steps:
- name: Sync corpus sources
run: stella groundtruth sources sync
- name: Run validation
run: |
stella groundtruth validate run \
--matcher semantic-diffing \
--parallel 4 \
--output validation-${{ github.run_id }}.md
- name: Check metrics
run: |
MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate')
if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then
echo "Match rate below threshold: $MATCH_RATE%"
exit 1
fi
```
### GitLab CI Example
```yaml
corpus-validation:
stage: verify
script:
- stella groundtruth sources sync --source buildinfo-debian
- stella groundtruth validate run --pairs "openssl:*" --output report.md
artifacts:
paths:
- report.md
expire_in: 1 week
rules:
- if: $CI_PIPELINE_SOURCE == "schedule"
```
## Offline Usage
For air-gapped environments, use offline bundles:
```bash
# Export corpus for offline use
stella bundle export \
--include-corpus \
--output corpus-bundle-$(date +%F).tar.gz
# Import on air-gapped system
stella bundle import --package corpus-bundle-2026-01-22.tar.gz
# Run validation offline
stella groundtruth validate run --offline
```
## Troubleshooting
### Common Issues
**Sync fails with network error:**
```bash
# Check source status
stella groundtruth sources list
# Retry with verbose output
stella groundtruth sources sync --source debuginfod-ubuntu -v
```
**Symbol lookup returns no results:**
```bash
# Verify debug-id format (hex string)
stella groundtruth symbols lookup --debug-id abc123 -v
# Try searching by package instead
stella groundtruth symbols search --package libcrypto
```
**Validation metrics show low match rate:**
- Check that both vuln and patch binaries are present in corpus
- Verify symbol sources are synced and enabled
- Consider using `hybrid` matcher for complex cases
## See Also
- [CLI Command Reference](commands/reference.md#ground-truth-corpus-commands)
- [BinaryIndex Architecture](../../binary-index/architecture.md)
- [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md)
- [Air-Gap Bundle Guide](../../modules/airgap/README.md)

View File

@@ -0,0 +1,36 @@
# Trust Profiles
Trust profiles are offline trust-store templates for bundle verification. They define trust roots, Rekor public keys, and TSA roots in a single file so operators can apply a profile into a local trust store.
Default profile location:
- `etc/trust-profiles/*.trustprofile.json`
- Assets referenced by profiles live under `etc/trust-profiles/assets/`
Profile structure (summary):
- `profileId`: stable identifier (used by CLI commands)
- `trustRoots[]`: signing trust roots (PEM files)
- `rekorKeys[]`: Rekor public keys for offline inclusion proof verification
- `tsaRoots[]`: TSA roots for RFC3161 verification
- `metadata`: optional compliance metadata
CLI usage:
- `stella trust-profile list`
- `stella trust-profile show <profile-id>`
- `stella trust-profile apply <profile-id> --output <dir>`
Profile lookup overrides:
- `--profiles-dir <path>` to point at a custom profiles directory
- `STELLAOPS_TRUST_PROFILES` environment variable for default lookup
Apply output:
- `trust-manifest.json` (trust roots manifest for offline verification)
- `trust-profile.json` (resolved profile copy)
- `trust-root.pem` (combined trust roots for CLI verification)
- `trust-roots/`, `rekor/`, `tsa/` folders with PEM assets
Example apply workflow:
1. `stella trust-profile apply global --output ./trust-store`
2. `stella bundle verify --trust-root ./trust-store/trust-root.pem`
Note:
- Default profiles ship with placeholder roots for scaffolding only. Replace them with compliance-approved roots before production use.

View File

@@ -10,18 +10,68 @@ The SBOM Learning API enables Concelier to learn which advisories are relevant t
Concelier normalizes incoming CycloneDX 1.7 and SPDX 3.0.1 documents into the internal `ParsedSbom` model for matching and downstream analysis.
Current extraction coverage (SPRINT_20260119_015):
- Document metadata: format, specVersion, serialNumber, created, name, namespace when present
- Components: bomRef, type, name, version, purl, cpe, hashes (including SPDX verifiedUsing), license IDs/expressions, license text (base64 decode), external references, properties, scope/modified, supplier/manufacturer, evidence, pedigree, cryptoProperties, modelCard (CycloneDX)
- Dependencies: component dependency edges (CycloneDX dependencies, SPDX relationships)
- Document metadata: format, specVersion, serialNumber, created, name, profiles, sbomType, namespace/imports
- Components: bomRef, type, name, version, purl, cpe, hashes (including SPDX verifiedUsing), license IDs/expressions, license text (base64 decode), external references, properties, scope/modified, supplier/manufacturer, evidence, pedigree, cryptoProperties, modelCard (CycloneDX), swid (CycloneDX), SPDX AI model parameters, SPDX dataset metadata, SPDX file/snippet properties
- Licensing: SPDX Licensing profile elements (listed/custom licenses, license additions, AND/OR/WITH/or-later operators), with OSI/FSF flags and deprecated IDs captured
- Dependencies: component dependency edges (CycloneDX dependencies, SPDX relationships; DependencyOf is inverted to DependsOn)
- Vulnerabilities: CycloneDX embedded vulnerabilities (ratings, affects, VEX analysis), SPDX Security profile vulnerabilities + VEX assessments
- Services: endpoints, authentication, crossesTrustBoundary, data flows, licenses, external references (CycloneDX)
- Formulation: components, workflows, tasks, properties (CycloneDX)
- Declarations/definitions: attestations, affirmations, standards, signatures (CycloneDX)
- Compositions/annotations (CycloneDX)
- Build metadata: buildId, buildType, timestamps, config source, environment, parameters (SPDX)
- Document properties
Notes:
- Full SPDX Licensing profile objects, vulnerabilities, and other SPDX profiles are pending in SPRINT_20260119_015.
- License expressions can be validated against embedded SPDX license/exception lists via `ILicenseExpressionValidator`.
- Matching currently uses PURL and CPE; additional fields are stored for downstream consumers.
## VEX consumption
When SBOM vulnerabilities include embedded VEX analysis, Concelier consumes the statements
to filter or annotate advisory matches. NotAffected statements can be filtered when policy
allows, and trust evaluation checks timestamps, signatures (when provided), and justification
requirements for not-affected claims.
Configuration (YAML or JSON), loaded from `Concelier:VexConsumption:PolicyPath`:
```yaml
vexConsumptionPolicy:
trustEmbeddedVex: true
minimumTrustLevel: Unverified
filterNotAffected: true
signatureRequirements:
requireSignedVex: false
trustedSigners:
- "https://example.com/keys/vex-signer"
timestampRequirements:
maxAgeHours: 720
requireTimestamp: true
conflictResolution:
strategy: mostRecent
logConflicts: true
mergePolicy:
mode: union
externalSources:
- type: repository
url: "https://vex.example.com/api"
justificationRequirements:
requireJustificationForNotAffected: true
acceptedJustifications:
- component_not_present
- vulnerable_code_not_present
- vulnerable_code_not_in_execute_path
- inline_mitigations_already_exist
```
Reports are emitted via `VexConsumptionReporter` in JSON, SARIF, and text formats.
Runtime overrides can be supplied via `Concelier:VexConsumption` (Enabled, IgnoreVex,
PolicyPath, TrustEmbeddedVex, MinimumTrustLevel, FilterNotAffected, ExternalVexSources).
## Flow
```
@@ -339,23 +389,51 @@ var affected = await sbomService.GetAffectedAdvisoriesAsync(
```sql
CREATE TABLE vuln.sbom_registry (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
tenant_id UUID NOT NULL,
artifact_id TEXT NOT NULL,
sbom_digest TEXT NOT NULL,
sbom_format TEXT NOT NULL,
digest TEXT NOT NULL,
format TEXT NOT NULL CHECK (format IN ('cyclonedx', 'spdx')),
spec_version TEXT NOT NULL,
primary_name TEXT,
primary_version TEXT,
component_count INT NOT NULL DEFAULT 0,
affected_count INT NOT NULL DEFAULT 0,
source TEXT NOT NULL,
tenant_id TEXT,
registered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
last_matched_at TIMESTAMPTZ,
CONSTRAINT uq_sbom_registry_digest UNIQUE (tenant_id, sbom_digest)
CONSTRAINT uq_sbom_registry_digest UNIQUE (digest)
);
CREATE TABLE vuln.sbom_canonical_match (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
sbom_id UUID NOT NULL REFERENCES vuln.sbom_registry(id),
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id),
matched_purl TEXT NOT NULL,
purl TEXT NOT NULL,
match_method TEXT NOT NULL,
confidence NUMERIC(3,2) NOT NULL DEFAULT 1.0,
is_reachable BOOLEAN NOT NULL DEFAULT false,
is_deployed BOOLEAN NOT NULL DEFAULT false,
matched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT uq_sbom_canonical_match UNIQUE (sbom_id, canonical_id)
CONSTRAINT uq_sbom_canonical_match UNIQUE (sbom_id, canonical_id, purl)
);
CREATE TABLE concelier.sbom_documents (
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
serial_number TEXT NOT NULL,
artifact_digest TEXT,
format TEXT NOT NULL CHECK (format IN ('cyclonedx', 'spdx')),
spec_version TEXT NOT NULL,
component_count INT NOT NULL DEFAULT 0,
service_count INT NOT NULL DEFAULT 0,
vulnerability_count INT NOT NULL DEFAULT 0,
has_crypto BOOLEAN NOT NULL DEFAULT false,
has_services BOOLEAN NOT NULL DEFAULT false,
has_vulnerabilities BOOLEAN NOT NULL DEFAULT false,
license_ids TEXT[] NOT NULL DEFAULT '{}',
license_expressions TEXT[] NOT NULL DEFAULT '{}',
sbom_json JSONB NOT NULL,
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
CONSTRAINT uq_concelier_sbom_serial UNIQUE (serial_number),
CONSTRAINT uq_concelier_sbom_artifact UNIQUE (artifact_digest)
);
```

View File

@@ -15,6 +15,7 @@ Provide a single, deterministic aggregation layer for cross-service UX workflows
- Persist dashboard personalization and layout preferences.
- Provide global search aggregation across entities.
- Surface platform metadata for UI bootstrapping (version, build, offline status).
- Expose analytics lake aggregates for SBOM, vulnerability, and attestation reporting.
## API surface (v1)
@@ -49,6 +50,16 @@ Provide a single, deterministic aggregation layer for cross-service UX workflows
### Metadata
- GET `/api/v1/platform/metadata`
- Response includes a capabilities list for UI bootstrapping; analytics capability is reported only when analytics storage is configured.
### Analytics (SBOM lake)
- GET `/api/analytics/suppliers`
- GET `/api/analytics/licenses`
- GET `/api/analytics/vulnerabilities`
- GET `/api/analytics/backlog`
- GET `/api/analytics/attestation-coverage`
- GET `/api/analytics/trends/vulnerabilities`
- GET `/api/analytics/trends/components`
## Data model
- `platform.dashboard_preferences` (dashboard layout, widgets, filters)
@@ -72,11 +83,58 @@ Provide a single, deterministic aggregation layer for cross-service UX workflows
- Preferences: `ui.preferences.read`, `ui.preferences.write`
- Search: `search.read` plus downstream service scopes (`findings:read`, `policy:read`, etc.)
- Metadata: `platform.metadata.read`
- Analytics: `analytics.read`
## Determinism and offline posture
- Stable ordering with explicit sort keys and deterministic tiebreakers.
- Stable ordering with explicit sort keys and deterministic tiebreakers.
- All timestamps in UTC ISO-8601.
- Cache last-known snapshots for offline rendering with "data as of" markers.
- Cache last-known snapshots for offline rendering with "data as of" markers.
## Analytics ingestion configuration
Analytics ingestion runs inside the Platform WebService and subscribes to Scanner,
Concelier, and Attestor streams. Configure ingestion with `Platform:AnalyticsIngestion`:
```yaml
Platform:
AnalyticsIngestion:
Enabled: true
PostgresConnectionString: "" # optional; defaults to Platform:Storage
AllowedTenants: ["tenant-a"]
Streams:
ScannerStream: "orchestrator:events"
ConcelierObservationStream: "concelier:advisory.observation.updated:v1"
ConcelierLinksetStream: "concelier:advisory.linkset.updated:v1"
AttestorStream: "attestor:events"
StartFromBeginning: false
Cas:
RootPath: "/var/lib/stellaops/cas"
DefaultBucket: "attestations"
Attestations:
BundleUriTemplate: "bundle:{digest}"
```
`BundleUriTemplate` supports `{digest}` and `{hash}` placeholders. The `bundle:` scheme
maps to `cas://<DefaultBucket>/{digest}` by default. Verify offline bundles with
`stella bundle verify` before ingestion.
## Analytics maintenance configuration
Analytics rollups + materialized view refreshes are driven by
`PlatformAnalyticsMaintenanceService` when analytics storage is configured.
Use `BackfillDays` to recompute recent rollups on the first maintenance run (set to `0` to disable).
```yaml
Platform:
Storage:
PostgresConnectionString: "Host=...;Database=...;Username=...;Password=..."
AnalyticsMaintenance:
Enabled: true
RunOnStartup: true
IntervalMinutes: 1440
ComputeDailyRollups: true
RefreshMaterializedViews: true
BackfillDays: 7
```
## Observability
- Metrics: `platform.aggregate.latency_ms`, `platform.aggregate.errors_total`, `platform.aggregate.cache_hits_total`

View File

@@ -17,6 +17,7 @@ The service operates strictly downstream of the **Aggregation-Only Contract (AOC
- Compile and evaluate `stella-dsl@1` policy packs into deterministic verdicts.
- Join SBOM inventory, Concelier advisories, and Excititor VEX evidence via canonical linksets and equivalence tables.
- Evaluate SBOM license expressions against policy (SPDX AND/OR/WITH/+), emitting compliance findings and attribution requirements for gate decisions.
- Materialise effective findings (`effective_finding_{policyId}`) with append-only history and produce explain traces.
- Emit CVSS v4.0 receipts with canonical hashing and policy replay/backfill rules; store tenant-scoped receipts with RBAC; export receipts deterministically (UTC/fonts/order) and flag v3.1→v4.0 conversions (see Sprint 0190 CVSS-GAPS-190-014 / `docs/modules/policy/cvss-v4.md`).
- Emit per-finding OpenVEX decisions anchored to reachability evidence, forward them to Signer/Attestor for DSSE/Rekor, and publish the resulting artifacts for bench/verification consumers.
@@ -171,9 +172,52 @@ The Determinization subsystem calculates uncertainty scores based on signal comp
**Usage in policies:**
Determinization scores are exposed to SPL policies via the `signals.trust.*` and `signals.uncertainty.*` namespaces. Use `signals.uncertainty.entropy` to access entropy values and `signals.trust.score` for aggregated trust scores that combine VEX, reachability, runtime, and other signals with decay/weighting.
### 3.2 - License compliance configuration
License compliance evaluation runs during SBOM evaluation when enabled in
`licenseCompliance` settings.
```json
{
"licenseCompliance": {
"enabled": true,
"policyPath": "policies/license-policy.yaml"
}
}
```
- `sbom.license` exposes the compliance report (findings, conflicts, inventory).
- `sbom.license_status` exposes `pass`, `warn`, or `fail` (or `unknown` when disabled).
- Failures set the policy verdict status to `blocked` and emit `license.*` annotations.
- Trademark notice obligations are tracked alongside attribution requirements and produce warn-level findings.
- License compliance reports support JSON, text/markdown/html, legal-review, and PDF outputs.
- Category breakdown includes percent totals and chart renderings (ASCII chart in text/markdown/legal-review/PDF, pie chart in HTML).
---
## 4·Data Model & Persistence
### 3.3 - NTIA compliance configuration
NTIA minimum-elements validation runs when enabled under `ntiaCompliance`.
```json
{
"ntiaCompliance": {
"enabled": true,
"enforceGate": false,
"policyPath": "policies/ntia-policy.yaml"
}
}
```
- `sbom.ntia` exposes NTIA compliance details (elements, findings, supplier status).
- `sbom.ntia_status` exposes `pass`, `warn`, `fail`, or `unknown`.
- NTIA compliance can be configured as an advisory-only check or a release gate via `enforceGate`.
- The NTIA policy supports element selection, supplier validation (placeholder patterns, trusted/blocked lists), and framework-specific requirements.
- Reports support JSON, text/markdown/html, and PDF output for regulatory submissions.
---
## 4·Data Model & Persistence
### 4.1 Collections

View File

@@ -382,19 +382,19 @@ public class EvidenceHashDeterminismTests
### Run All Tests
```bash
dotnet test src/StellaOps.sln
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln
```
### Run Only Unit Tests
```bash
dotnet test src/StellaOps.sln --filter "Category=Unit"
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln --filter "Category=Unit"
```
### Run Only Integration Tests
```bash
dotnet test src/StellaOps.sln --filter "Category=Integration"
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln --filter "Category=Integration"
```
### Run Specific Test Class
@@ -406,7 +406,7 @@ dotnet test --filter "FullyQualifiedName~PromotionValidatorTests"
### Run with Coverage
```bash
dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage"
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln --collect:"XPlat Code Coverage"
```
---

View File

@@ -14,10 +14,14 @@
**Boundaries.**
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
* Scanner **does not** keep thirdâ€party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plugâ€ins (e.g., patchâ€presence) run under explicit flags and never contaminate the core SBOM.
---
* Scanner **does not** keep thirdâ€party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plugâ€ins (e.g., patchâ€presence) run under explicit flags and never contaminate the core SBOM.
SBOM dependency reachability inference uses dependency graphs to reduce false positives and
apply reachability-aware severity adjustments. See `src/Scanner/docs/sbom-reachability-filtering.md`
for policy configuration and reporting expectations.
---
## 1) Solution & project layout
@@ -374,7 +378,40 @@ public sealed record BinaryFindingEvidence
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
### 5.6 DSSE attestation (via Signer/Attestor)
### 5.5.1 Service security analysis (Sprint 20260119_016)
When an SBOM path is provided, the worker runs the `service-security` stage to parse CycloneDX services and emit a deterministic report covering:
- Endpoint scheme hygiene (HTTP/WS/plaintext protocol detection).
- Authentication and trust-boundary enforcement.
- Sensitive data flow exposure and unencrypted transfers.
- Deprecated service versions and rate-limiting metadata gaps.
Inputs are passed via scan metadata (`sbom.path` or `sbomPath`, plus `sbom.format`). The report is attached as a surface observation payload (`service-security.report`) and keyed in the analysis store for downstream policy and report assembly. See `src/Scanner/docs/service-security.md` for the policy schema and output formats.
### 5.5.2 CBOM crypto analysis (Sprint 20260119_017)
When an SBOM includes CycloneDX `cryptoProperties`, the worker runs the `crypto-analysis` stage to produce a crypto inventory and compliance findings for weak algorithms, short keys, deprecated protocol versions, certificate hygiene, and post-quantum readiness. The report is attached as a surface observation payload (`crypto-analysis.report`) and keyed in the analysis store for downstream evidence workflows. See `src/Scanner/docs/crypto-analysis.md` for the policy schema and inventory export formats.
### 5.5.3 AI/ML supply chain security (Sprint 20260119_018)
When an SBOM includes CycloneDX `modelCard` or SPDX AI profile data, the worker runs the `ai-ml-security` stage to evaluate model governance readiness. The report covers model card completeness, training data provenance, bias/fairness checks, safety risk assessment coverage, and provenance verification. The report is attached as a surface observation payload (`ai-ml-security.report`) and keyed in the analysis store for policy evaluation and audit trails. See `src/Scanner/docs/ai-ml-security.md` for policy schema, CLI toggles, and binary analysis conventions.
### 5.5.4 Build provenance verification (Sprint 20260119_019)
When an SBOM includes CycloneDX formulation or SPDX build profile data, the worker runs the `build-provenance` stage to verify provenance completeness, builder trust, source integrity, hermetic build requirements, and optional reproducibility checks. The report is attached as a surface observation payload (`build-provenance.report`) and keyed in the analysis store for policy enforcement and audit evidence. See `src/Scanner/docs/build-provenance.md` for policy schema, CLI toggles, and report formats.
### 5.5.5 SBOM dependency reachability (Sprint 20260119_022)
When configured, the worker runs the `reachability-analysis` stage to infer dependency reachability from SBOM graphs and optionally refine it with a `richgraph-v1` call graph. Advisory matches are filtered or severity-adjusted using `VulnerabilityReachabilityFilter`, with false-positive reduction metrics recorded for auditability. The stage attaches:
- `reachability.report` (JSON) for component and vulnerability reachability.
- `reachability.report.sarif` (SARIF 2.1.0) for toolchain export.
- `reachability.graph.dot` (GraphViz) for dependency visualization.
Configuration lives in `src/Scanner/docs/sbom-reachability-filtering.md`, including policy schema, metadata keys, and report outputs.
### 5.6 DSSE attestation (via Signer/Attestor)
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
* Calls **Signer** (requires **OpTok + PoE**); Signer verifies **entitlement + scanner image integrity** and returns **DSSE bundle**.