tests fixes and sprints work
This commit is contained in:
@@ -39,6 +39,7 @@ Key settings:
|
||||
- `subject`: sha256 (+ optional sha512) digest of the bundle target.
|
||||
- `timestamps`: RFC3161/eIDAS timestamp entries with TSA chain/OCSP/CRL refs.
|
||||
- `rekorProofs`: entry body/inclusion proof paths plus signed entry timestamp for offline verification.
|
||||
- Inline artifacts (no `path`) are capped at 4 MiB; larger artifacts are written under `artifacts/`.
|
||||
|
||||
## Dependencies
|
||||
|
||||
@@ -55,6 +56,63 @@ Key settings:
|
||||
- Mirror: `../mirror/`
|
||||
- ExportCenter: `../export-center/`
|
||||
|
||||
## Evidence Bundles for Air-Gapped Verification
|
||||
|
||||
The AirGap module supports golden corpus evidence bundles for offline verification of patch provenance. These bundles enable auditors to verify security patch status without network access.
|
||||
|
||||
### Bundle Contents
|
||||
|
||||
Evidence bundles follow the OCI format and contain:
|
||||
- Pre/post binaries with debug symbols
|
||||
- Canonical SBOM for each binary
|
||||
- DSSE delta-sig predicate proving patch status
|
||||
- Build provenance (if available from buildinfo)
|
||||
- RFC 3161 timestamps for each signed artifact
|
||||
- Validation run results and KPIs
|
||||
|
||||
### Bundle Export
|
||||
|
||||
```bash
|
||||
stella groundtruth bundle export \
|
||||
--packages openssl,zlib,glibc \
|
||||
--distros debian,fedora \
|
||||
--output symbol-bundle.tar.gz \
|
||||
--sign-with cosign
|
||||
```
|
||||
|
||||
### Bundle Import and Verification
|
||||
|
||||
```bash
|
||||
stella groundtruth bundle import \
|
||||
--input symbol-bundle.tar.gz \
|
||||
--verify-signature \
|
||||
--trusted-keys /etc/stellaops/trusted-keys.pub \
|
||||
--output verification-report.md
|
||||
```
|
||||
|
||||
### Standalone Verifier
|
||||
|
||||
For air-gapped environments without the full Stella Ops stack, use the standalone verifier:
|
||||
|
||||
```bash
|
||||
stella-verifier verify \
|
||||
--bundle evidence-bundle.oci.tar \
|
||||
--trusted-keys trusted-keys.pub \
|
||||
--trust-profile eu-eidas.trustprofile.json \
|
||||
--output report.json
|
||||
```
|
||||
|
||||
Exit codes:
|
||||
- `0`: All verifications passed
|
||||
- `1`: One or more verifications failed
|
||||
- `2`: Invalid input or configuration error
|
||||
|
||||
### Related Documentation
|
||||
|
||||
- [Golden Corpus Layout](../binary-index/golden-corpus-layout.md)
|
||||
- [Golden Corpus Maintenance](../binary-index/golden-corpus-maintenance.md)
|
||||
- [Golden Corpus Operations Runbook](../../runbooks/golden-corpus-operations.md)
|
||||
|
||||
## Current Status
|
||||
|
||||
Implemented with Controller for snapshot export and Importer for secure ingestion. Staleness policies enforce time-bound validity. Integrated with ExportCenter for bundle packaging and all data modules for content export/import.
|
||||
|
||||
@@ -17,7 +17,7 @@ Stella Ops generates rich data through SBOM ingestion, vulnerability correlation
|
||||
|------------|-------------|
|
||||
| Unified component registry | Canonical component table with normalized suppliers and licenses |
|
||||
| Vulnerability correlation | Pre-joined component-vulnerability mapping with EPSS/KEV flags |
|
||||
| VEX-adjusted exposure | Vulnerability counts that respect VEX overrides |
|
||||
| VEX-adjusted exposure | Vulnerability counts that respect active VEX overrides (validity windows applied) |
|
||||
| Attestation tracking | Provenance and SLSA level coverage by environment/team |
|
||||
| Time-series rollups | Daily snapshots for trend analysis |
|
||||
| Materialized views | Pre-computed aggregations for dashboard performance |
|
||||
@@ -68,6 +68,14 @@ Stella Ops generates rich data through SBOM ingestion, vulnerability correlation
|
||||
| `daily_vulnerability_counts` | Rollup | Daily vuln aggregations |
|
||||
| `daily_component_counts` | Rollup | Daily component aggregations |
|
||||
|
||||
Rollup retention is 90 days in hot storage. `compute_daily_rollups()` prunes
|
||||
older rows after each run; archival follows operations runbooks.
|
||||
Platform WebService can automate rollups + materialized view refreshes via
|
||||
`PlatformAnalyticsMaintenanceService` (see `architecture.md` for schedule and
|
||||
configuration).
|
||||
Use `Platform:AnalyticsMaintenance:BackfillDays` to recompute the most recent
|
||||
N days of rollups on the first maintenance run after downtime (set to `0` to disable).
|
||||
|
||||
### Materialized Views
|
||||
|
||||
| View | Refresh | Purpose |
|
||||
@@ -77,33 +85,36 @@ Stella Ops generates rich data through SBOM ingestion, vulnerability correlation
|
||||
| `mv_vuln_exposure` | Daily | CVE exposure adjusted by VEX |
|
||||
| `mv_attestation_coverage` | Daily | Provenance/SLSA coverage by env/team |
|
||||
|
||||
Array-valued fields (for example `environments` and `ecosystems`) are ordered
|
||||
alphabetically to keep analytics outputs deterministic.
|
||||
|
||||
## Quick Start
|
||||
|
||||
### Day-1 Queries
|
||||
|
||||
**Top supplier concentration (supply chain risk):**
|
||||
**Top supplier concentration (supply chain risk, optional environment filter):**
|
||||
```sql
|
||||
SELECT * FROM analytics.sp_top_suppliers(20);
|
||||
SELECT analytics.sp_top_suppliers(20, 'prod');
|
||||
```
|
||||
|
||||
**License risk heatmap:**
|
||||
**License risk heatmap (optional environment filter):**
|
||||
```sql
|
||||
SELECT * FROM analytics.sp_license_heatmap();
|
||||
SELECT analytics.sp_license_heatmap('prod');
|
||||
```
|
||||
|
||||
**CVE exposure adjusted by VEX:**
|
||||
```sql
|
||||
SELECT * FROM analytics.sp_vuln_exposure('prod', 'high');
|
||||
SELECT analytics.sp_vuln_exposure('prod', 'high');
|
||||
```
|
||||
|
||||
**Fixable vulnerability backlog:**
|
||||
```sql
|
||||
SELECT * FROM analytics.sp_fixable_backlog('prod');
|
||||
SELECT analytics.sp_fixable_backlog('prod');
|
||||
```
|
||||
|
||||
**Attestation coverage gaps:**
|
||||
```sql
|
||||
SELECT * FROM analytics.sp_attestation_gaps('prod');
|
||||
SELECT analytics.sp_attestation_gaps('prod');
|
||||
```
|
||||
|
||||
### API Endpoints
|
||||
@@ -118,6 +129,82 @@ SELECT * FROM analytics.sp_attestation_gaps('prod');
|
||||
| `/api/analytics/trends/vulnerabilities` | GET | Vulnerability time-series |
|
||||
| `/api/analytics/trends/components` | GET | Component time-series |
|
||||
|
||||
All analytics endpoints require the `analytics.read` scope.
|
||||
The platform metadata capability `analytics` reports whether analytics storage is configured.
|
||||
|
||||
#### Query Parameters
|
||||
- `/api/analytics/suppliers`: `limit` (optional, default 20), `environment` (optional)
|
||||
- `/api/analytics/licenses`: `environment` (optional)
|
||||
- `/api/analytics/vulnerabilities`: `minSeverity` (optional, default `low`), `environment` (optional)
|
||||
- `/api/analytics/backlog`: `environment` (optional)
|
||||
- `/api/analytics/attestation-coverage`: `environment` (optional)
|
||||
- `/api/analytics/trends/vulnerabilities`: `environment` (optional), `days` (optional, default 30)
|
||||
- `/api/analytics/trends/components`: `environment` (optional), `days` (optional, default 30)
|
||||
|
||||
## Ingestion Configuration
|
||||
|
||||
Analytics ingestion runs inside the Platform WebService and subscribes to Scanner, Concelier, and Attestor streams. Configure ingestion via `Platform:AnalyticsIngestion`:
|
||||
|
||||
```yaml
|
||||
Platform:
|
||||
Storage:
|
||||
PostgresConnectionString: "Host=...;Database=analytics;Username=...;Password=..."
|
||||
AnalyticsIngestion:
|
||||
Enabled: true
|
||||
PostgresConnectionString: "" # optional; defaults to Platform:Storage
|
||||
AllowedTenants: ["tenant-a", "tenant-b"]
|
||||
Streams:
|
||||
ScannerStream: "orchestrator:events"
|
||||
ConcelierObservationStream: "concelier:advisory.observation.updated:v1"
|
||||
ConcelierLinksetStream: "concelier:advisory.linkset.updated:v1"
|
||||
AttestorStream: "attestor:events"
|
||||
StartFromBeginning: false
|
||||
Cas:
|
||||
RootPath: "/var/lib/stellaops/cas"
|
||||
DefaultBucket: "attestations"
|
||||
Attestations:
|
||||
BundleUriTemplate: "bundle:{digest}"
|
||||
```
|
||||
|
||||
Bundle URI templates support:
|
||||
- `{digest}` for the full digest string (for example `sha256:...`).
|
||||
- `{hash}` for the raw hex digest (no algorithm prefix).
|
||||
- `bundle:{digest}` which resolves to `cas://<DefaultBucket>/{digest}` by default.
|
||||
- `file:/path/to/bundles/bundle-{hash}.json` for offline file ingestion.
|
||||
|
||||
For offline workflows, verify bundles with `stella bundle verify` before ingesting them.
|
||||
|
||||
## Console UI
|
||||
|
||||
SBOM Lake analytics are exposed in the Console under `Analytics > SBOM Lake` (`/analytics/sbom-lake`).
|
||||
Console access requires `ui.read` plus `analytics.read` scopes.
|
||||
|
||||
Key UI features:
|
||||
- Filters for environment, minimum severity, and time window.
|
||||
- Panels for suppliers, licenses, vulnerability exposure, and attestation coverage.
|
||||
- Trend views for vulnerabilities and components.
|
||||
- Fixable backlog table with CSV export.
|
||||
|
||||
See [console.md](./console.md) for operator guidance and filter behavior.
|
||||
|
||||
## CLI Access
|
||||
|
||||
SBOM lake analytics are exposed via the CLI under `stella analytics sbom-lake`
|
||||
(requires `analytics.read` scope).
|
||||
|
||||
```bash
|
||||
# Top suppliers
|
||||
stella analytics sbom-lake suppliers --limit 20
|
||||
|
||||
# Vulnerability exposure in prod (high+), CSV export
|
||||
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high --format csv --output vuln.csv
|
||||
|
||||
# 30-day trends for both series
|
||||
stella analytics sbom-lake trends --days 30 --series all --format json
|
||||
```
|
||||
|
||||
See `docs/modules/cli/guides/commands/analytics.md` for command-level details.
|
||||
|
||||
## Architecture
|
||||
|
||||
See [architecture.md](./architecture.md) for detailed design decisions, data flow, and normalization rules.
|
||||
@@ -133,4 +220,6 @@ See [analytics_schema.sql](../../db/analytics_schema.sql) for complete DDL inclu
|
||||
|
||||
## Sprint Reference
|
||||
|
||||
Implementation tracked in: `docs/implplan/SPRINT_20260120_030_Platform_sbom_analytics_lake.md`
|
||||
Implementation tracked in:
|
||||
- `docs/implplan/SPRINT_20260120_030_Platform_sbom_analytics_lake.md`
|
||||
- `docs/implplan/SPRINT_20260120_032_Cli_sbom_analytics_cli.md`
|
||||
|
||||
@@ -7,7 +7,7 @@ The Analytics module implements a **star-schema data warehouse** pattern optimiz
|
||||
1. **Separation of concerns**: Analytics schema is isolated from operational schemas (scanner, vex, proof_system)
|
||||
2. **Pre-computation**: Expensive aggregations computed in advance via materialized views
|
||||
3. **Audit trail**: Raw payloads preserved for reprocessing and compliance
|
||||
4. **Determinism**: All normalization functions are immutable and reproducible
|
||||
4. **Determinism**: Normalization functions are immutable and reproducible; array aggregates are ordered for stable outputs
|
||||
5. **Incremental updates**: Supports both full refresh and incremental ingestion
|
||||
|
||||
## Data Flow
|
||||
@@ -120,10 +120,9 @@ When a component is upserted, the `VulnerabilityCorrelationService` queries Conc
|
||||
2. Filter by version range matching
|
||||
3. Upsert to `component_vulns` with severity, EPSS, KEV flags
|
||||
|
||||
**Version range matching** uses Concelier's existing logic to handle:
|
||||
- Semver ranges: `>=1.0.0 <2.0.0`
|
||||
- Exact versions: `1.2.3`
|
||||
- Wildcards: `1.x`
|
||||
**Version range matching** currently supports semver ranges and exact matches via
|
||||
`VersionRuleEvaluator`. Non-semver schemes fall back to exact string matches; wildcard
|
||||
and ecosystem-specific ranges require upstream normalization.
|
||||
|
||||
## VEX Override Logic
|
||||
|
||||
@@ -145,7 +144,21 @@ COUNT(DISTINCT ac.artifact_id) FILTER (
|
||||
**Override validity:**
|
||||
- `valid_from`: When the override became effective
|
||||
- `valid_until`: Expiration (NULL = no expiration)
|
||||
- Only `status = 'not_affected'` reduces exposure counts
|
||||
- Only `status = 'not_affected'` reduces exposure counts, and only when the override is active in its validity window.
|
||||
|
||||
## Attestation Ingestion
|
||||
|
||||
Attestation ingestion consumes Attestor Rekor entry events and expects Sigstore bundles
|
||||
or raw DSSE envelopes. The ingestion service:
|
||||
- Resolves bundle URIs using `BundleUriTemplate`; `bundle:{digest}` maps to
|
||||
`cas://<DefaultBucket>/{digest}` by default.
|
||||
- Decodes DSSE payloads, computes `dsse_payload_hash`, and records `predicate_uri` plus
|
||||
Rekor log metadata (`rekor_log_id`, `rekor_log_index`).
|
||||
- Uses in-toto `subject` digests to link artifacts when reanalysis hints are absent.
|
||||
- Maps predicate URIs into `analytics_attestation_type` values
|
||||
(`provenance`, `sbom`, `vex`, `build`, `scan`, `policy`).
|
||||
- Expands VEX statements into `vex_overrides` rows, one per product reference, and
|
||||
captures optional validity timestamps when provided.
|
||||
|
||||
## Time-Series Rollups
|
||||
|
||||
@@ -164,14 +177,14 @@ Daily rollups computed by `compute_daily_rollups()`:
|
||||
- `total_components`: Distinct components
|
||||
- `unique_suppliers`: Distinct normalized suppliers
|
||||
|
||||
**Retention policy:** 90 days in hot storage; older data archived to cold storage.
|
||||
**Retention policy:** 90 days in hot storage; `compute_daily_rollups()` prunes older rows and downstream jobs archive to cold storage.
|
||||
|
||||
## Materialized View Refresh
|
||||
|
||||
All materialized views support `REFRESH ... CONCURRENTLY` for zero-downtime updates:
|
||||
|
||||
```sql
|
||||
-- Refresh all views (run daily via pg_cron or Scheduler)
|
||||
-- Refresh all views (non-concurrent; run off-peak)
|
||||
SELECT analytics.refresh_all_views();
|
||||
```
|
||||
|
||||
@@ -182,6 +195,19 @@ SELECT analytics.refresh_all_views();
|
||||
- `mv_attestation_coverage`: 02:45 UTC daily
|
||||
- `compute_daily_rollups()`: 03:00 UTC daily
|
||||
|
||||
Platform WebService can run the daily rollup + refresh loop via
|
||||
`PlatformAnalyticsMaintenanceService`. Configure the schedule with:
|
||||
- `Platform:AnalyticsMaintenance:Enabled` (default `true`)
|
||||
- `Platform:AnalyticsMaintenance:IntervalMinutes` (default `1440`)
|
||||
- `Platform:AnalyticsMaintenance:RunOnStartup` (default `true`)
|
||||
- `Platform:AnalyticsMaintenance:ComputeDailyRollups` (default `true`)
|
||||
- `Platform:AnalyticsMaintenance:RefreshMaterializedViews` (default `true`)
|
||||
- `Platform:AnalyticsMaintenance:BackfillDays` (default `0`, set to `0` to disable; recompute the most recent N days on the first maintenance run)
|
||||
|
||||
The hosted service issues concurrent refresh statements directly for each view.
|
||||
Use a DB scheduler (pg_cron) or external orchestrator if you need the staggered
|
||||
per-view timing above.
|
||||
|
||||
## Performance Considerations
|
||||
|
||||
### Indexing Strategy
|
||||
@@ -198,9 +224,9 @@ SELECT analytics.refresh_all_views();
|
||||
|
||||
| Query | Target | Notes |
|
||||
|-------|--------|-------|
|
||||
| `sp_top_suppliers(20)` | < 100ms | Uses materialized view |
|
||||
| `sp_license_heatmap()` | < 100ms | Uses materialized view |
|
||||
| `sp_vuln_exposure()` | < 200ms | Uses materialized view |
|
||||
| `sp_top_suppliers(20, 'prod')` | < 100ms | Uses materialized view when env is null; env filter reads base tables |
|
||||
| `sp_license_heatmap('prod')` | < 100ms | Uses materialized view when env is null; env filter reads base tables |
|
||||
| `sp_vuln_exposure()` | < 200ms | Uses materialized view for global queries; environment filters read base tables |
|
||||
| `sp_fixable_backlog()` | < 500ms | Live query with indexes |
|
||||
| `sp_attestation_gaps()` | < 100ms | Uses materialized view |
|
||||
|
||||
@@ -246,12 +272,12 @@ All tables include `created_at` and `updated_at` timestamps. Raw payload tables
|
||||
|
||||
### Upstream Dependencies
|
||||
|
||||
| Service | Event | Action |
|
||||
|---------|-------|--------|
|
||||
| Scanner | SBOM ingested | Normalize and upsert components |
|
||||
| Concelier | Advisory updated | Re-correlate affected components |
|
||||
| Excititor | VEX observation | Create/update vex_overrides |
|
||||
| Attestor | Attestation created | Upsert attestation record |
|
||||
| Service | Event | Contract | Action |
|
||||
|---------|-------|----------|--------|
|
||||
| Scanner | SBOM report ready | `scanner.event.report.ready@1` (`docs/modules/signals/events/orchestrator-scanner-events.md`) | Normalize and upsert components |
|
||||
| Concelier | Advisory observation/linkset updated | `advisory.observation.updated@1` (`docs/modules/concelier/events/advisory.observation.updated@1.schema.json`), `advisory.linkset.updated@1` (`docs/modules/concelier/events/advisory.linkset.updated@1.md`) | Re-correlate affected components |
|
||||
| Excititor | VEX statement changes | `vex.statement.*` (`docs/modules/excititor/architecture.md`) | Create/update vex_overrides |
|
||||
| Attestor | Rekor entry logged | `rekor.entry.logged` (`docs/modules/attestor/architecture.md`) | Upsert attestation record |
|
||||
|
||||
### Downstream Consumers
|
||||
|
||||
|
||||
64
docs/modules/analytics/console.md
Normal file
64
docs/modules/analytics/console.md
Normal file
@@ -0,0 +1,64 @@
|
||||
# Analytics Console (SBOM Lake)
|
||||
|
||||
The Console exposes SBOM analytics lake data under `Analytics > SBOM Lake`.
|
||||
This view is read-only and uses the analytics API endpoints documented in `docs/modules/analytics/README.md`.
|
||||
|
||||
## Access
|
||||
|
||||
- Route: `/analytics/sbom-lake`
|
||||
- Required scopes: `ui.read` and `analytics.read`
|
||||
- Console admin bundles: `role/analytics-viewer`, `role/analytics-operator`, `role/analytics-admin`
|
||||
- Data freshness: the page surfaces the latest `dataAsOf` timestamp returned by the API.
|
||||
|
||||
## Filters
|
||||
|
||||
The SBOM Lake page supports three filters that round-trip via URL query parameters:
|
||||
|
||||
- Environment: `env` (optional, example: `Prod`)
|
||||
- Minimum severity: `severity` (optional, example: `high`)
|
||||
- Time window (days): `days` (optional, example: `90`)
|
||||
|
||||
When a filter changes, the Console reloads all panels using the updated parameters.
|
||||
Supplier and license panels honor the environment filter alongside the other views.
|
||||
|
||||
## Panels
|
||||
|
||||
The dashboard presents four summary panels:
|
||||
|
||||
1. Supplier concentration (top suppliers by component count)
|
||||
2. License distribution (license categories and counts)
|
||||
3. Vulnerability exposure (top CVEs after VEX adjustments)
|
||||
4. Attestation coverage (provenance and SLSA 2+ coverage)
|
||||
|
||||
Each panel shows a loading state, empty state, and summary counts.
|
||||
|
||||
## Trends
|
||||
|
||||
Two trend panels are included:
|
||||
|
||||
- Vulnerability trend: net exposure over the selected time window
|
||||
- Component trend: total components and unique suppliers
|
||||
|
||||
The Console aggregates trend points by date and renders a simple bar chart plus a compact list.
|
||||
|
||||
## Fixable Backlog
|
||||
|
||||
The fixable backlog table lists vulnerabilities with fixes available, grouped by component and service.
|
||||
The "Top backlog components" table derives a component summary from the same backlog data.
|
||||
|
||||
### CSV Export
|
||||
|
||||
The "Export backlog CSV" action downloads a deterministic, ordered CSV with:
|
||||
|
||||
- Service
|
||||
- Component
|
||||
- Version
|
||||
- Vulnerability
|
||||
- Severity
|
||||
- Environment
|
||||
- Fixed version
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
- If panels show "No data", verify that the analytics schema and materialized views are populated.
|
||||
- If an error banner appears, check the analytics API availability and ensure the tenant has `analytics.read`.
|
||||
@@ -9,8 +9,8 @@ This document provides ready-to-use SQL queries for common analytics use cases.
|
||||
Identifies suppliers with the highest component footprint, indicating supply chain concentration risk.
|
||||
|
||||
```sql
|
||||
-- Via stored procedure (recommended)
|
||||
SELECT * FROM analytics.sp_top_suppliers(20);
|
||||
-- Via stored procedure (recommended, optional environment filter)
|
||||
SELECT analytics.sp_top_suppliers(20, 'prod');
|
||||
|
||||
-- Direct query
|
||||
SELECT
|
||||
@@ -33,8 +33,8 @@ LIMIT 20;
|
||||
Shows distribution of components by license category for compliance review.
|
||||
|
||||
```sql
|
||||
-- Via stored procedure
|
||||
SELECT * FROM analytics.sp_license_heatmap();
|
||||
-- Via stored procedure (optional environment filter)
|
||||
SELECT analytics.sp_license_heatmap('prod');
|
||||
|
||||
-- Direct query with grouping
|
||||
SELECT
|
||||
@@ -62,9 +62,9 @@ Shows true vulnerability exposure after applying VEX mitigations.
|
||||
|
||||
```sql
|
||||
-- Via stored procedure
|
||||
SELECT * FROM analytics.sp_vuln_exposure('prod', 'high');
|
||||
SELECT analytics.sp_vuln_exposure('prod', 'high');
|
||||
|
||||
-- Direct query showing VEX effectiveness
|
||||
-- Direct query showing VEX effectiveness (global view; use sp_vuln_exposure for environment filtering)
|
||||
SELECT
|
||||
vuln_id,
|
||||
severity::TEXT,
|
||||
@@ -97,7 +97,7 @@ Lists vulnerabilities that can be fixed today (fix available, not VEX-mitigated)
|
||||
|
||||
```sql
|
||||
-- Via stored procedure
|
||||
SELECT * FROM analytics.sp_fixable_backlog('prod');
|
||||
SELECT analytics.sp_fixable_backlog('prod');
|
||||
|
||||
-- Direct query with priority scoring
|
||||
SELECT
|
||||
@@ -130,6 +130,7 @@ JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
|
||||
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
|
||||
AND vo.vuln_id = cv.vuln_id
|
||||
AND vo.status = 'not_affected'
|
||||
AND vo.valid_from <= now()
|
||||
AND (vo.valid_until IS NULL OR vo.valid_until > now())
|
||||
WHERE cv.affects = TRUE
|
||||
AND cv.fix_available = TRUE
|
||||
@@ -147,7 +148,7 @@ Shows attestation gaps by environment and team.
|
||||
|
||||
```sql
|
||||
-- Via stored procedure
|
||||
SELECT * FROM analytics.sp_attestation_gaps('prod');
|
||||
SELECT analytics.sp_attestation_gaps('prod');
|
||||
|
||||
-- Direct query with gap analysis
|
||||
SELECT
|
||||
@@ -267,6 +268,7 @@ JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
|
||||
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
|
||||
LEFT JOIN analytics.vex_overrides vo ON vo.artifact_id = a.artifact_id
|
||||
AND vo.vuln_id = cv.vuln_id
|
||||
AND vo.valid_from <= now()
|
||||
AND (vo.valid_until IS NULL OR vo.valid_until > now())
|
||||
WHERE cv.vuln_id = 'CVE-2021-44228'
|
||||
ORDER BY a.environment, a.name;
|
||||
@@ -312,7 +314,7 @@ SELECT
|
||||
c.license_category::TEXT,
|
||||
c.supplier_normalized AS supplier,
|
||||
COUNT(DISTINCT a.artifact_id) AS artifact_count,
|
||||
ARRAY_AGG(DISTINCT a.name) AS affected_artifacts
|
||||
ARRAY_AGG(DISTINCT a.name ORDER BY a.name) AS affected_artifacts
|
||||
FROM analytics.components c
|
||||
JOIN analytics.artifact_components ac ON ac.component_id = c.component_id
|
||||
JOIN analytics.artifacts a ON a.artifact_id = ac.artifact_id
|
||||
@@ -340,6 +342,8 @@ SELECT
|
||||
FROM analytics.component_vulns cv
|
||||
JOIN analytics.vex_overrides vo ON vo.vuln_id = cv.vuln_id
|
||||
AND vo.status = 'not_affected'
|
||||
AND vo.valid_from <= now()
|
||||
AND (vo.valid_until IS NULL OR vo.valid_until > now())
|
||||
WHERE cv.published_at >= now() - INTERVAL '90 days'
|
||||
AND cv.published_at IS NOT NULL
|
||||
GROUP BY cv.severity
|
||||
|
||||
@@ -14,7 +14,7 @@ StellaOps SBOM interoperability tests ensure compatibility with third-party secu
|
||||
| SPDX | 3.0.1 | ✅ Supported | 95%+ |
|
||||
|
||||
Notes:
|
||||
- SPDX 3.0.1 generation currently emits JSON-LD `@context`, `spdxVersion`, core document/package/relationship elements, software package/file/snippet metadata, build profile elements with output relationships, security vulnerabilities with assessment relationships, verifiedUsing hashes/signatures, and external references/identifiers. Full profile coverage is tracked in SPRINT_20260119_014.
|
||||
- SPDX 3.0.1 generation currently emits JSON-LD `@context`, `spdxVersion`, core document/package/relationship elements (including agent/tool elements for creationInfo), software package/file/snippet metadata, build profile elements with output relationships, security vulnerabilities with assessment relationships, licensing license elements with declared/concluded relationships, AI AIPackage metadata (autonomy, domain, metrics, safety risk assessment), Dataset package metadata (type, collection, preprocessing, availability), verifiedUsing hashes/signatures, external references/identifiers (including externalRef contentType when available), namespaceMap/imports for cross-document references, extension metadata via SbomExtension namespace/properties on document/component/vulnerability elements, and Lite profile output (opt-in via SpdxWriterOptions.UseLiteProfile). Full profile coverage is tracked in SPRINT_20260119_014.
|
||||
|
||||
### Third-Party Tools
|
||||
|
||||
|
||||
@@ -29,11 +29,14 @@ Use the bundle verification flow aligned to domain operations:
|
||||
|
||||
```bash
|
||||
stella bundle verify --bundle /path/to/bundle --offline --trust-root /path/to/tsa-root.pem --rekor-checkpoint /path/to/checkpoint.json
|
||||
stella bundle verify --bundle /path/to/bundle --offline --signer /path/to/report-key.pem --signer-cert /path/to/report-cert.pem
|
||||
```
|
||||
|
||||
Notes:
|
||||
- Offline mode fails closed when revocation evidence is missing or invalid.
|
||||
- Offline mode fails closed when revocation evidence is missing or invalid.
|
||||
- Trust roots must be provided locally; no network fetches are allowed.
|
||||
- When `--signer` is set, a DSSE report is written to `out/verification.report.json`.
|
||||
- Signed report metadata includes `verifier.algo`, `verifier.cert`, `signed_at`.
|
||||
|
||||
## 4. Verification Behavior
|
||||
|
||||
|
||||
@@ -1239,7 +1239,183 @@ binaryindex:
|
||||
|
||||
---
|
||||
|
||||
## 10. References
|
||||
## 10. Golden Corpus for Patch Provenance
|
||||
|
||||
> **Sprint:** SPRINT_20260121_034/035/036 - Golden Corpus Implementation
|
||||
|
||||
The BinaryIndex module supports a **golden corpus** of patch-paired artifacts that enables offline SBOM reproducibility and binary-level patch provenance verification.
|
||||
|
||||
### 10.1 Corpus Purpose
|
||||
|
||||
The golden corpus provides:
|
||||
- **Auditor-ready evidence bundles** for air-gapped customers
|
||||
- **Regression testing** for binary matching accuracy
|
||||
- **Proof of patch status** independent of package metadata
|
||||
|
||||
### 10.2 Corpus Sources
|
||||
|
||||
| Source | Type | Purpose |
|
||||
|--------|------|---------|
|
||||
| Debian Security Tracker / DSAs | Advisory | Primary advisory linkage |
|
||||
| Debian Snapshot | Binary archive | Pre/post patch binary pairs |
|
||||
| Ubuntu Security Notices | Advisory | Ubuntu-specific advisories |
|
||||
| Alpine secdb | Advisory | Alpine YAML advisories |
|
||||
| OSV dump | Unified schema | Cross-reference and commit ranges |
|
||||
|
||||
### 10.2.1 Symbol Source Connectors
|
||||
|
||||
> **Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
|
||||
|
||||
The corpus ingestion layer uses pluggable connectors to retrieve symbols and metadata from upstream sources:
|
||||
|
||||
| Connector ID | Implementation | Protocol | Data Retrieved |
|
||||
|--------------|----------------|----------|----------------|
|
||||
| `debuginfod-fedora` | `DebuginfodConnector` | debuginfod HTTP | ELF debug symbols by Build-ID |
|
||||
| `debuginfod-ubuntu` | `DebuginfodConnector` | debuginfod HTTP | ELF debug symbols by Build-ID |
|
||||
| `ddeb-ubuntu` | `DdebConnector` | APT/HTTP | `.ddeb` debug packages |
|
||||
| `buildinfo-debian` | `BuildinfoConnector` | HTTP | `.buildinfo` reproducibility records |
|
||||
| `secdb-alpine` | `AlpineSecDbConnector` | Git/HTTP | `secfixes` YAML from APKBUILD |
|
||||
|
||||
**Connector Interface:**
|
||||
|
||||
```csharp
|
||||
public interface ISymbolSourceConnector
|
||||
{
|
||||
string ConnectorId { get; }
|
||||
string DisplayName { get; }
|
||||
string[] SupportedDistros { get; }
|
||||
|
||||
Task<ConnectorStatus> GetStatusAsync(CancellationToken ct);
|
||||
Task SyncAsync(SyncOptions options, CancellationToken ct);
|
||||
Task<SymbolLookupResult?> LookupByBuildIdAsync(string buildId, CancellationToken ct);
|
||||
Task<IAsyncEnumerable<SymbolRecord>> SearchAsync(SymbolSearchQuery query, CancellationToken ct);
|
||||
}
|
||||
```
|
||||
|
||||
**Debuginfod Connector:**
|
||||
|
||||
The `DebuginfodConnector` implements the [debuginfod protocol](https://sourceware.org/elfutils/Debuginfod.html) for retrieving debug symbols:
|
||||
|
||||
- Endpoint: `GET /buildid/<build-id>/debuginfo`
|
||||
- Supports federated queries across multiple debuginfod servers
|
||||
- Caches retrieved symbols in RustFS blob storage
|
||||
- Rate-limited to respect upstream server policies
|
||||
|
||||
**Ubuntu ddeb Connector:**
|
||||
|
||||
The `DdebConnector` retrieves Ubuntu debug symbol packages (`.ddeb`):
|
||||
|
||||
- Sources: `ddebs.ubuntu.com` mirror
|
||||
- Indexes: Reads `Packages.xz` for package metadata
|
||||
- Extraction: Unpacks `.ddeb` AR archives to extract DWARF symbols
|
||||
- Mapping: Links debug symbols to binary packages via Build-ID
|
||||
|
||||
**Debian Buildinfo Connector:**
|
||||
|
||||
The `BuildinfoConnector` retrieves Debian buildinfo files for reproducibility verification:
|
||||
|
||||
- Source: `buildinfos.debian.net` and snapshot archives
|
||||
- Purpose: Provides build environment metadata for reproducible builds
|
||||
- Fields extracted: `Build-Date`, `Build-Architecture`, `Checksums-Sha256`
|
||||
- Integration: Cross-references with binary packages for provenance
|
||||
|
||||
**Alpine SecDB Connector:**
|
||||
|
||||
The `AlpineSecDbConnector` parses Alpine's security database:
|
||||
|
||||
- Source: `secfixes` blocks in APKBUILD files
|
||||
- Repository: `alpine/aports` Git repository
|
||||
- Format: YAML blocks mapping CVEs to fixed versions
|
||||
- Example:
|
||||
```yaml
|
||||
secfixes:
|
||||
3.0.11-r0:
|
||||
- CVE-2024-0727
|
||||
- CVE-2024-0728
|
||||
```
|
||||
|
||||
**OSV Dump Parser:**
|
||||
|
||||
The `OsvDumpParser` processes Google OSV database dumps for advisory cross-correlation:
|
||||
|
||||
- Source: `osv.dev` bulk exports (JSON)
|
||||
- Purpose: CVE → commit range extraction for patch identification
|
||||
- Cross-reference: Correlates OSV entries with distribution advisories
|
||||
- Inconsistency detection: Identifies discrepancies between OSV and distro advisories
|
||||
|
||||
```csharp
|
||||
public interface IOsvDumpParser
|
||||
{
|
||||
IAsyncEnumerable<OsvParsedEntry> ParseDumpAsync(Stream osvDumpStream, CancellationToken ct);
|
||||
OsvCveIndex BuildCveIndex(IEnumerable<OsvParsedEntry> entries);
|
||||
IEnumerable<AdvisoryCorrelation> CrossReferenceWithExternal(
|
||||
OsvCveIndex osvIndex,
|
||||
IEnumerable<ExternalAdvisory> externalAdvisories);
|
||||
IEnumerable<AdvisoryInconsistency> DetectInconsistencies(
|
||||
IEnumerable<AdvisoryCorrelation> correlations);
|
||||
}
|
||||
```
|
||||
|
||||
**CLI Access:**
|
||||
|
||||
All connectors are manageable via the `stella groundtruth sources` CLI commands:
|
||||
|
||||
```bash
|
||||
# List all connectors
|
||||
stella groundtruth sources list
|
||||
|
||||
# Sync specific connector
|
||||
stella groundtruth sources sync --source buildinfo-debian --full
|
||||
|
||||
# Enable/disable connectors
|
||||
stella groundtruth sources enable ddeb-ubuntu
|
||||
stella groundtruth sources disable debuginfod-fedora
|
||||
```
|
||||
|
||||
See [Ground-Truth CLI Guide](../cli/guides/ground-truth-cli.md) for complete CLI documentation
|
||||
|
||||
### 10.3 Key Performance Indicators
|
||||
|
||||
| KPI | Target | Description |
|
||||
|-----|--------|-------------|
|
||||
| Per-function match rate | >= 90% | Functions matched in post-patch binary |
|
||||
| False-negative patch detection | <= 5% | Patched functions incorrectly classified |
|
||||
| SBOM canonical-hash stability | 3/3 | Determinism across independent runs |
|
||||
| Binary reconstruction equivalence | Trend | Rebuilt binary matches original |
|
||||
| End-to-end verify time (p95, cold) | Trend | Offline verification performance |
|
||||
|
||||
### 10.4 Validation Harness
|
||||
|
||||
The validation harness (`IValidationHarness`) orchestrates end-to-end verification:
|
||||
|
||||
```
|
||||
Binary Pair (pre/post) → Symbol Recovery → IR Lifting → Fingerprinting → Matching → Metrics
|
||||
```
|
||||
|
||||
### 10.5 Evidence Bundle Format
|
||||
|
||||
Evidence bundles follow OCI/ORAS conventions:
|
||||
|
||||
```
|
||||
<pkg>-<advisory>-bundle.oci.tar
|
||||
├── manifest.json # OCI manifest
|
||||
└── blobs/
|
||||
├── sha256:<sbom> # Canonical SBOM
|
||||
├── sha256:<pre-bin> # Pre-fix binary
|
||||
├── sha256:<post-bin> # Post-fix binary
|
||||
├── sha256:<delta-sig> # DSSE delta-sig predicate
|
||||
└── sha256:<timestamp> # RFC 3161 timestamp
|
||||
```
|
||||
|
||||
### 10.6 Related Documentation
|
||||
|
||||
- [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md)
|
||||
- [Golden Corpus Seed List](../../benchmarks/golden-corpus-seed-list.md)
|
||||
- [Ground-Truth Corpus Specification](../../benchmarks/ground-truth-corpus.md)
|
||||
|
||||
---
|
||||
|
||||
## 11. References
|
||||
|
||||
- Advisory: `docs/product/advisories/21-Dec-2025 - Mapping Evidence Within Compiled Binaries.md`
|
||||
- Scanner Native Analysis: `src/Scanner/StellaOps.Scanner.Analyzers.Native/`
|
||||
@@ -1248,8 +1424,9 @@ binaryindex:
|
||||
- **Semantic Diffing Sprint:** `docs/implplan/SPRINT_20260105_001_001_BINDEX_semdiff_ir_semantics.md`
|
||||
- **Semantic Library:** `src/BinaryIndex/__Libraries/StellaOps.BinaryIndex.Semantic/`
|
||||
- **Semantic Tests:** `src/BinaryIndex/__Tests/StellaOps.BinaryIndex.Semantic.Tests/`
|
||||
- **Golden Corpus Sprints:** `docs/implplan/SPRINT_20260121_034_BinaryIndex_golden_corpus_foundation.md`
|
||||
|
||||
---
|
||||
|
||||
*Document Version: 1.1.1*
|
||||
*Last Updated: 2026-01-14*
|
||||
*Document Version: 1.2.0*
|
||||
*Last Updated: 2026-01-21*
|
||||
|
||||
347
docs/modules/binary-index/golden-corpus-layout.md
Normal file
347
docs/modules/binary-index/golden-corpus-layout.md
Normal file
@@ -0,0 +1,347 @@
|
||||
# Golden Corpus Folder Layout
|
||||
|
||||
Sprint: SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
|
||||
Task: GCB-006 - Document corpus folder layout and maintenance procedures
|
||||
|
||||
## Overview
|
||||
|
||||
The golden corpus is a curated dataset of pre/post security patch binary pairs used for:
|
||||
- Validating binary matching algorithms
|
||||
- Benchmarking reproducibility verification
|
||||
- Training machine learning models for function identification
|
||||
- Generating audit-ready evidence bundles
|
||||
|
||||
## Root Layout
|
||||
|
||||
```
|
||||
golden-corpus/
|
||||
├── corpus/ # Security pairs organized by distro
|
||||
│ ├── debian/
|
||||
│ ├── ubuntu/
|
||||
│ └── alpine/
|
||||
├── mirrors/ # Local mirrors of upstream sources
|
||||
│ ├── debian/
|
||||
│ ├── ubuntu/
|
||||
│ ├── alpine/
|
||||
│ └── osv/
|
||||
├── harness/ # Build and verification tooling
|
||||
│ ├── chroots/
|
||||
│ ├── lifter-matcher/
|
||||
│ ├── sbom-canonicalizer/
|
||||
│ └── verifier/
|
||||
├── evidence/ # Generated evidence bundles
|
||||
│ └── <pkg>-<advisory>-bundle.oci.tar
|
||||
└── bench/ # Benchmark data and baselines
|
||||
├── baselines/
|
||||
└── results/
|
||||
```
|
||||
|
||||
## Corpus Directory Structure
|
||||
|
||||
Each security pair follows a consistent structure:
|
||||
|
||||
```
|
||||
corpus/<distro>/<package>/<advisory-id>/
|
||||
├── pre/ # Pre-patch (vulnerable) artifacts
|
||||
│ ├── src/ # Source code
|
||||
│ │ ├── *.tar.gz # Original source tarball
|
||||
│ │ ├── debian/ # Packaging metadata
|
||||
│ │ └── buildinfo # Build reproducibility info
|
||||
│ └── debs/ # Built binaries
|
||||
│ ├── *.deb # Binary packages
|
||||
│ ├── *.ddeb # Debug symbols
|
||||
│ └── buildlog # Build log
|
||||
├── post/ # Post-patch (fixed) artifacts
|
||||
│ ├── src/
|
||||
│ └── debs/
|
||||
└── metadata/
|
||||
├── advisory.json # Advisory details
|
||||
├── osv.json # OSV format vulnerability
|
||||
├── pair-manifest.json # Pair configuration
|
||||
└── ground-truth.json # Function-level ground truth
|
||||
```
|
||||
|
||||
### Debian Example
|
||||
|
||||
```
|
||||
corpus/debian/openssl/DSA-5678-1/
|
||||
├── pre/
|
||||
│ ├── src/
|
||||
│ │ ├── openssl_3.0.10.orig.tar.gz
|
||||
│ │ ├── openssl_3.0.10-1.debian.tar.xz
|
||||
│ │ ├── openssl_3.0.10-1.dsc
|
||||
│ │ └── openssl_3.0.10-1.buildinfo
|
||||
│ └── debs/
|
||||
│ ├── libssl3_3.0.10-1_amd64.deb
|
||||
│ ├── libssl3-dbgsym_3.0.10-1_amd64.ddeb
|
||||
│ └── build.log
|
||||
├── post/
|
||||
│ ├── src/
|
||||
│ │ ├── openssl_3.0.11.orig.tar.gz
|
||||
│ │ ├── openssl_3.0.11-1.debian.tar.xz
|
||||
│ │ └── ...
|
||||
│ └── debs/
|
||||
│ └── ...
|
||||
└── metadata/
|
||||
├── advisory.json
|
||||
└── ground-truth.json
|
||||
```
|
||||
|
||||
### Ubuntu Example
|
||||
|
||||
```
|
||||
corpus/ubuntu/curl/USN-1234-1/
|
||||
├── pre/
|
||||
│ ├── src/
|
||||
│ │ └── curl_8.4.0-1ubuntu1.tar.xz
|
||||
│ └── debs/
|
||||
│ └── libcurl4_8.4.0-1ubuntu1_amd64.deb
|
||||
├── post/
|
||||
│ └── ...
|
||||
└── metadata/
|
||||
├── advisory.json
|
||||
└── usn.json
|
||||
```
|
||||
|
||||
### Alpine Example
|
||||
|
||||
```
|
||||
corpus/alpine/zlib/CVE-2022-37434/
|
||||
├── pre/
|
||||
│ ├── src/
|
||||
│ │ └── APKBUILD
|
||||
│ └── apks/
|
||||
│ └── zlib-1.2.12-r2.apk
|
||||
├── post/
|
||||
│ └── ...
|
||||
└── metadata/
|
||||
└── secdb-entry.json
|
||||
```
|
||||
|
||||
## Mirrors Directory Structure
|
||||
|
||||
Local mirrors cache upstream artifacts for offline operation:
|
||||
|
||||
```
|
||||
mirrors/
|
||||
├── debian/
|
||||
│ ├── archive/ # snapshot.debian.org mirrors
|
||||
│ │ └── pool/main/o/openssl/
|
||||
│ ├── snapshot/ # Point-in-time snapshots
|
||||
│ │ └── 20260101T000000Z/
|
||||
│ └── buildinfo/ # buildinfos.debian.net cache
|
||||
│ └── <source-name>/
|
||||
├── ubuntu/
|
||||
│ ├── archive/ # archive.ubuntu.com mirrors
|
||||
│ ├── usn-index/ # USN metadata
|
||||
│ │ └── usn-db.json
|
||||
│ └── launchpad/ # Build logs from Launchpad
|
||||
├── alpine/
|
||||
│ ├── packages/ # Alpine package mirror
|
||||
│ └── secdb/ # Security database
|
||||
│ └── community.json
|
||||
└── osv/
|
||||
├── all.zip # Full OSV database
|
||||
└── debian/ # Distro-specific extracts
|
||||
```
|
||||
|
||||
## Harness Directory Structure
|
||||
|
||||
Build and verification tooling:
|
||||
|
||||
```
|
||||
harness/
|
||||
├── chroots/ # Build environments
|
||||
│ ├── debian-bookworm-amd64/
|
||||
│ ├── debian-bullseye-amd64/
|
||||
│ ├── ubuntu-noble-amd64/
|
||||
│ └── alpine-3.19-amd64/
|
||||
├── lifter-matcher/ # Binary analysis tools
|
||||
│ ├── ghidra/ # Ghidra installation
|
||||
│ ├── bsim-server/ # BSim database server
|
||||
│ └── semantic-diffing/ # Semantic diff tools
|
||||
├── sbom-canonicalizer/ # SBOM normalization
|
||||
│ └── config/
|
||||
└── verifier/ # Standalone verifier
|
||||
├── stella-verifier # Verifier binary
|
||||
└── trust-profiles/ # Trust profiles
|
||||
```
|
||||
|
||||
## Evidence Directory Structure
|
||||
|
||||
Generated bundles for audit/compliance:
|
||||
|
||||
```
|
||||
evidence/
|
||||
├── openssl-DSA-5678-1-bundle.oci.tar
|
||||
├── curl-USN-1234-1-bundle.oci.tar
|
||||
└── manifests/
|
||||
└── inventory.json
|
||||
```
|
||||
|
||||
### Bundle Internal Structure (OCI Format)
|
||||
|
||||
```
|
||||
openssl-DSA-5678-1-bundle.oci.tar/
|
||||
├── oci-layout # OCI layout version
|
||||
├── index.json # OCI index with referrers
|
||||
├── blobs/
|
||||
│ └── sha256/
|
||||
│ ├── <manifest> # Bundle manifest
|
||||
│ ├── <sbom-pre> # Pre-patch SBOM
|
||||
│ ├── <sbom-post> # Post-patch SBOM
|
||||
│ ├── <binary-pre> # Pre-patch binary
|
||||
│ ├── <binary-post> # Post-patch binary
|
||||
│ ├── <delta-sig> # DSSE delta-sig predicate
|
||||
│ ├── <provenance> # Build provenance
|
||||
│ └── <timestamp> # RFC 3161 timestamp
|
||||
└── manifest.json # Signed bundle manifest
|
||||
```
|
||||
|
||||
## Bench Directory Structure
|
||||
|
||||
Benchmark data and KPI baselines:
|
||||
|
||||
```
|
||||
bench/
|
||||
├── baselines/
|
||||
│ ├── current.json # Active KPI baseline
|
||||
│ └── archive/ # Historical baselines
|
||||
│ ├── baseline-20260115.json
|
||||
│ └── baseline-20260108.json
|
||||
├── results/
|
||||
│ ├── 20260122120000.json # Validation run results
|
||||
│ └── ...
|
||||
└── reports/
|
||||
└── regression-report-*.md
|
||||
```
|
||||
|
||||
### Baseline File Format
|
||||
|
||||
```json
|
||||
{
|
||||
"baselineId": "baseline-20260122120000",
|
||||
"createdAt": "2026-01-22T12:00:00Z",
|
||||
"source": "abc123def456",
|
||||
"description": "Post-semantic-diffing-v2 baseline",
|
||||
"precision": 0.95,
|
||||
"recall": 0.92,
|
||||
"falseNegativeRate": 0.08,
|
||||
"deterministicReplayRate": 1.0,
|
||||
"ttfrpP95Ms": 150,
|
||||
"additionalKpis": {}
|
||||
}
|
||||
```
|
||||
|
||||
## File Naming Conventions
|
||||
|
||||
| Type | Pattern | Example |
|
||||
|------|---------|---------|
|
||||
| Advisory ID (Debian) | `DSA-<number>-<revision>` | `DSA-5678-1` |
|
||||
| Advisory ID (Ubuntu) | `USN-<number>-<revision>` | `USN-1234-1` |
|
||||
| Advisory ID (Alpine) | `CVE-<year>-<number>` | `CVE-2022-37434` |
|
||||
| Bundle file | `<pkg>-<advisory>-bundle.oci.tar` | `openssl-DSA-5678-1-bundle.oci.tar` |
|
||||
| Baseline file | `baseline-<timestamp>.json` | `baseline-20260122120000.json` |
|
||||
| Results file | `<timestamp>.json` | `20260122120000.json` |
|
||||
|
||||
## Metadata Files
|
||||
|
||||
### advisory.json
|
||||
|
||||
```json
|
||||
{
|
||||
"advisoryId": "DSA-5678-1",
|
||||
"cves": ["CVE-2024-1234", "CVE-2024-5678"],
|
||||
"package": "openssl",
|
||||
"vulnerableVersions": ["3.0.10-1"],
|
||||
"fixedVersions": ["3.0.11-1"],
|
||||
"severity": "high",
|
||||
"publishedAt": "2024-11-15T00:00:00Z",
|
||||
"summary": "Multiple vulnerabilities in OpenSSL"
|
||||
}
|
||||
```
|
||||
|
||||
### pair-manifest.json
|
||||
|
||||
```json
|
||||
{
|
||||
"pairId": "openssl-DSA-5678-1",
|
||||
"package": "openssl",
|
||||
"distribution": "debian",
|
||||
"suite": "bookworm",
|
||||
"architecture": "amd64",
|
||||
"preVersion": "3.0.10-1",
|
||||
"postVersion": "3.0.11-1",
|
||||
"binaries": [
|
||||
"libssl3",
|
||||
"libcrypto3"
|
||||
],
|
||||
"createdAt": "2026-01-15T10:00:00Z",
|
||||
"validatedAt": "2026-01-22T12:00:00Z"
|
||||
}
|
||||
```
|
||||
|
||||
### ground-truth.json
|
||||
|
||||
```json
|
||||
{
|
||||
"pairId": "openssl-DSA-5678-1",
|
||||
"binary": "libcrypto.so.3",
|
||||
"functions": [
|
||||
{
|
||||
"name": "EVP_DigestInit_ex",
|
||||
"preAddress": "0x12345",
|
||||
"postAddress": "0x12347",
|
||||
"status": "modified",
|
||||
"confidence": 1.0
|
||||
},
|
||||
{
|
||||
"name": "EVP_DigestUpdate",
|
||||
"preAddress": "0x12400",
|
||||
"postAddress": "0x12400",
|
||||
"status": "unchanged",
|
||||
"confidence": 1.0
|
||||
}
|
||||
],
|
||||
"metadata": {
|
||||
"generatedBy": "manual-annotation",
|
||||
"reviewedBy": "security-team",
|
||||
"reviewedAt": "2026-01-20T14:00:00Z"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
## Access Patterns
|
||||
|
||||
### Read-Only Access
|
||||
- Validation harness reads corpus pairs
|
||||
- CI reads baselines for regression checks
|
||||
- Auditors read evidence bundles
|
||||
|
||||
### Write Access
|
||||
- Corpus ingestion adds new pairs
|
||||
- Baseline update writes new baseline files
|
||||
- Bundle export creates evidence bundles
|
||||
|
||||
### Sync Access
|
||||
- Mirror sync updates upstream caches
|
||||
- Scheduled jobs refresh OSV database
|
||||
|
||||
## Storage Requirements
|
||||
|
||||
| Component | Typical Size | Growth Rate |
|
||||
|-----------|--------------|-------------|
|
||||
| Corpus (per pair) | 50-500 MB | N/A |
|
||||
| Mirrors (Debian) | 10-50 GB | Monthly |
|
||||
| Mirrors (Ubuntu) | 5-20 GB | Monthly |
|
||||
| Mirrors (Alpine) | 1-5 GB | Monthly |
|
||||
| OSV Database | 500 MB | Weekly |
|
||||
| Evidence bundles | 100-500 MB each | Per pair |
|
||||
| Baselines | < 10 KB each | Per run |
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Ground Truth Corpus Overview](ground-truth-corpus.md)
|
||||
- [Golden Corpus Maintenance](golden-corpus-maintenance.md)
|
||||
- [Corpus Ingestion Operations](corpus-ingestion-operations.md)
|
||||
- [Golden Corpus Operations Runbook](../../runbooks/golden-corpus-operations.md)
|
||||
492
docs/modules/binary-index/golden-corpus-maintenance.md
Normal file
492
docs/modules/binary-index/golden-corpus-maintenance.md
Normal file
@@ -0,0 +1,492 @@
|
||||
# Golden Corpus Maintenance
|
||||
|
||||
Sprint: SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
|
||||
Task: GCB-006 - Document corpus folder layout and maintenance procedures
|
||||
|
||||
## Overview
|
||||
|
||||
This document describes maintenance procedures for the golden corpus, including:
|
||||
- Mirror synchronization
|
||||
- Baseline management
|
||||
- Evidence bundle generation
|
||||
- Health monitoring
|
||||
|
||||
## Mirror Synchronization
|
||||
|
||||
### Automated Sync Schedule
|
||||
|
||||
Mirror sync should be automated via cron jobs or CI scheduled workflows.
|
||||
|
||||
#### Recommended Schedule
|
||||
|
||||
| Mirror | Frequency | Rationale |
|
||||
|--------|-----------|-----------|
|
||||
| Debian archive | Daily | Security updates published daily |
|
||||
| Debian buildinfo | Daily | Matches archive updates |
|
||||
| Ubuntu archive | Daily | Security updates published daily |
|
||||
| Ubuntu USN index | Hourly | USN metadata changes frequently |
|
||||
| Alpine secdb | Daily | Less frequent updates |
|
||||
| OSV database | Hourly | Aggregates multiple sources |
|
||||
|
||||
### Sync Scripts
|
||||
|
||||
#### Debian Mirror Sync
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# sync-debian-mirrors.sh
|
||||
# Syncs Debian archives and buildinfo
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
|
||||
DEBIAN_MIRROR="${DEBIAN_MIRROR:-https://snapshot.debian.org}"
|
||||
BUILDINFO_URL="${BUILDINFO_URL:-https://buildinfos.debian.net}"
|
||||
|
||||
# Packages to mirror (security-relevant)
|
||||
PACKAGES=(openssl curl zlib glibc libxml2 libpng)
|
||||
|
||||
# Sync source packages
|
||||
for pkg in "${PACKAGES[@]}"; do
|
||||
echo "Syncing Debian sources for: $pkg"
|
||||
|
||||
# Create package directory
|
||||
mkdir -p "$MIRRORS_ROOT/debian/archive/pool/main/${pkg:0:1}/$pkg"
|
||||
|
||||
# Download available versions
|
||||
rsync -avz --progress \
|
||||
"rsync://snapshot.debian.org/snapshot/debian/pool/main/${pkg:0:1}/$pkg/" \
|
||||
"$MIRRORS_ROOT/debian/archive/pool/main/${pkg:0:1}/$pkg/"
|
||||
done
|
||||
|
||||
# Sync buildinfo files
|
||||
for pkg in "${PACKAGES[@]}"; do
|
||||
echo "Syncing buildinfo for: $pkg"
|
||||
|
||||
mkdir -p "$MIRRORS_ROOT/debian/buildinfo/$pkg"
|
||||
|
||||
# Use wget to fetch buildinfo index and files
|
||||
wget -r -np -nH --cut-dirs=2 -P "$MIRRORS_ROOT/debian/buildinfo/$pkg" \
|
||||
"$BUILDINFO_URL/api/v1/buildinfo/$pkg/" || true
|
||||
done
|
||||
|
||||
echo "Debian mirror sync complete"
|
||||
date > "$MIRRORS_ROOT/debian/.last-sync"
|
||||
```
|
||||
|
||||
#### Ubuntu Mirror Sync
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# sync-ubuntu-mirrors.sh
|
||||
# Syncs Ubuntu archives and USN metadata
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
|
||||
UBUNTU_ARCHIVE="https://archive.ubuntu.com/ubuntu"
|
||||
USN_API="https://ubuntu.com/security/notices.json"
|
||||
|
||||
# Sync USN database
|
||||
echo "Syncing Ubuntu USN database..."
|
||||
mkdir -p "$MIRRORS_ROOT/ubuntu/usn-index"
|
||||
curl -sSL "$USN_API" -o "$MIRRORS_ROOT/ubuntu/usn-index/usn-db.json.tmp"
|
||||
mv "$MIRRORS_ROOT/ubuntu/usn-index/usn-db.json.tmp" "$MIRRORS_ROOT/ubuntu/usn-index/usn-db.json"
|
||||
|
||||
# Sync packages (similar to Debian)
|
||||
PACKAGES=(openssl curl zlib1g libxml2)
|
||||
|
||||
for pkg in "${PACKAGES[@]}"; do
|
||||
echo "Syncing Ubuntu sources for: $pkg"
|
||||
mkdir -p "$MIRRORS_ROOT/ubuntu/archive/pool/main/${pkg:0:1}/$pkg"
|
||||
# ... sync logic
|
||||
done
|
||||
|
||||
echo "Ubuntu mirror sync complete"
|
||||
date > "$MIRRORS_ROOT/ubuntu/.last-sync"
|
||||
```
|
||||
|
||||
#### Alpine SecDB Sync
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# sync-alpine-secdb.sh
|
||||
# Syncs Alpine security database
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
|
||||
ALPINE_SECDB="https://secdb.alpinelinux.org"
|
||||
|
||||
mkdir -p "$MIRRORS_ROOT/alpine/secdb"
|
||||
|
||||
# Download all security databases
|
||||
for branch in v3.17 v3.18 v3.19 v3.20 edge; do
|
||||
for repo in main community; do
|
||||
echo "Syncing Alpine secdb: $branch/$repo"
|
||||
curl -sSL "$ALPINE_SECDB/$branch/$repo.json" \
|
||||
-o "$MIRRORS_ROOT/alpine/secdb/${branch}-${repo}.json" || true
|
||||
done
|
||||
done
|
||||
|
||||
echo "Alpine secdb sync complete"
|
||||
date > "$MIRRORS_ROOT/alpine/.last-sync"
|
||||
```
|
||||
|
||||
#### OSV Database Sync
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# sync-osv.sh
|
||||
# Syncs OSV vulnerability database
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
|
||||
OSV_URL="https://osv-vulnerabilities.storage.googleapis.com"
|
||||
|
||||
mkdir -p "$MIRRORS_ROOT/osv"
|
||||
|
||||
# Download full database
|
||||
echo "Downloading OSV all.zip..."
|
||||
curl -sSL "$OSV_URL/all.zip" -o "$MIRRORS_ROOT/osv/all.zip.tmp"
|
||||
mv "$MIRRORS_ROOT/osv/all.zip.tmp" "$MIRRORS_ROOT/osv/all.zip"
|
||||
|
||||
# Extract ecosystem-specific databases
|
||||
for ecosystem in Debian Ubuntu Alpine; do
|
||||
mkdir -p "$MIRRORS_ROOT/osv/$ecosystem"
|
||||
unzip -o -q "$MIRRORS_ROOT/osv/all.zip" "$ecosystem/*" -d "$MIRRORS_ROOT/osv/" || true
|
||||
done
|
||||
|
||||
echo "OSV sync complete"
|
||||
date > "$MIRRORS_ROOT/osv/.last-sync"
|
||||
```
|
||||
|
||||
### Cron Configuration
|
||||
|
||||
```cron
|
||||
# /etc/cron.d/golden-corpus-sync
|
||||
|
||||
# Mirror sync jobs
|
||||
0 */4 * * * corpus /opt/golden-corpus/scripts/sync-debian-mirrors.sh >> /var/log/corpus/debian-sync.log 2>&1
|
||||
0 */4 * * * corpus /opt/golden-corpus/scripts/sync-ubuntu-mirrors.sh >> /var/log/corpus/ubuntu-sync.log 2>&1
|
||||
0 6 * * * corpus /opt/golden-corpus/scripts/sync-alpine-secdb.sh >> /var/log/corpus/alpine-sync.log 2>&1
|
||||
0 * * * * corpus /opt/golden-corpus/scripts/sync-osv.sh >> /var/log/corpus/osv-sync.log 2>&1
|
||||
|
||||
# Health check
|
||||
*/15 * * * * corpus /opt/golden-corpus/scripts/check-mirror-health.sh >> /var/log/corpus/health.log 2>&1
|
||||
```
|
||||
|
||||
## Baseline Management
|
||||
|
||||
### When to Update Baselines
|
||||
|
||||
Update the KPI baseline when:
|
||||
1. Algorithm improvements are merged (expected KPI improvement)
|
||||
2. New corpus pairs are added (may change baseline metrics)
|
||||
3. False positives/negatives are corrected in ground truth
|
||||
4. Major version upgrades of analysis tools
|
||||
|
||||
### Baseline Update Procedure
|
||||
|
||||
#### 1. Run Full Validation
|
||||
|
||||
```bash
|
||||
# Run validation on the full corpus
|
||||
stella groundtruth validate run \
|
||||
--matcher semantic-diffing \
|
||||
--output bench/results/$(date +%Y%m%d%H%M%S).json \
|
||||
--verbose
|
||||
```
|
||||
|
||||
#### 2. Review Results
|
||||
|
||||
```bash
|
||||
# Check metrics
|
||||
stella groundtruth validate metrics --run-id latest
|
||||
|
||||
# Compare against current baseline
|
||||
stella groundtruth validate check \
|
||||
--results bench/results/latest.json \
|
||||
--baseline bench/baselines/current.json
|
||||
```
|
||||
|
||||
#### 3. Update Baseline
|
||||
|
||||
Only if regression check passes or improvements are expected:
|
||||
|
||||
```bash
|
||||
# Archive current baseline
|
||||
cp bench/baselines/current.json \
|
||||
bench/baselines/archive/baseline-$(date +%Y%m%d).json
|
||||
|
||||
# Update baseline
|
||||
stella groundtruth baseline update \
|
||||
--from-results bench/results/latest.json \
|
||||
--output bench/baselines/current.json \
|
||||
--description "Post algorithm-v2.3 update" \
|
||||
--source "$(git rev-parse HEAD)"
|
||||
```
|
||||
|
||||
#### 4. Commit and Document
|
||||
|
||||
```bash
|
||||
# Commit the baseline update
|
||||
git add bench/baselines/
|
||||
git commit -m "chore(bench): update golden corpus baseline
|
||||
|
||||
Reason: Algorithm v2.3 improvements
|
||||
Previous baseline: baseline-20260115.json
|
||||
|
||||
Metrics:
|
||||
- Precision: 0.95 -> 0.97 (+2pp)
|
||||
- Recall: 0.92 -> 0.94 (+2pp)
|
||||
- FN Rate: 0.08 -> 0.06 (-2pp)
|
||||
- Determinism: 100%
|
||||
- TTFRP p95: 150ms -> 140ms (-7%)"
|
||||
|
||||
git push
|
||||
```
|
||||
|
||||
### Baseline Rollback
|
||||
|
||||
If a baseline update causes issues:
|
||||
|
||||
```bash
|
||||
# Restore previous baseline
|
||||
cp bench/baselines/archive/baseline-20260115.json \
|
||||
bench/baselines/current.json
|
||||
|
||||
git add bench/baselines/current.json
|
||||
git commit -m "revert(bench): rollback baseline to 20260115"
|
||||
git push
|
||||
```
|
||||
|
||||
## Evidence Bundle Generation
|
||||
|
||||
### Manual Bundle Export
|
||||
|
||||
```bash
|
||||
# Export bundle for specific packages
|
||||
stella groundtruth bundle export \
|
||||
--packages openssl,curl,zlib \
|
||||
--distros debian,ubuntu \
|
||||
--output evidence/security-bundle-$(date +%Y%m%d).tar.gz \
|
||||
--sign-with-cosign \
|
||||
--include-debug \
|
||||
--include-kpis \
|
||||
--include-timestamps
|
||||
```
|
||||
|
||||
### Automated Bundle Generation
|
||||
|
||||
Schedule bundle generation for compliance reporting:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# generate-compliance-bundles.sh
|
||||
# Run monthly for audit evidence
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
EVIDENCE_DIR="/data/golden-corpus/evidence"
|
||||
MONTH=$(date +%Y%m)
|
||||
|
||||
# Generate bundles for each distro
|
||||
for distro in debian ubuntu alpine; do
|
||||
stella groundtruth bundle export \
|
||||
--distros "$distro" \
|
||||
--packages all \
|
||||
--output "$EVIDENCE_DIR/$distro-bundle-$MONTH.tar.gz" \
|
||||
--sign-with-cosign \
|
||||
--include-kpis \
|
||||
--include-timestamps
|
||||
done
|
||||
|
||||
# Create manifest
|
||||
echo "{\"month\": \"$MONTH\", \"bundles\": [\"debian\", \"ubuntu\", \"alpine\"]}" \
|
||||
> "$EVIDENCE_DIR/manifest-$MONTH.json"
|
||||
```
|
||||
|
||||
### Bundle Verification
|
||||
|
||||
Always verify bundles after generation:
|
||||
|
||||
```bash
|
||||
# Verify bundle integrity
|
||||
stella groundtruth bundle import \
|
||||
--input evidence/security-bundle-20260122.tar.gz \
|
||||
--verify \
|
||||
--trusted-keys /etc/stellaops/trusted-keys.pub \
|
||||
--trust-profile /etc/stellaops/trust-profiles/global.json \
|
||||
--output verification-report.md
|
||||
```
|
||||
|
||||
## Health Monitoring
|
||||
|
||||
### Doctor Checks
|
||||
|
||||
Run Doctor checks regularly to validate corpus health:
|
||||
|
||||
```bash
|
||||
# Run all corpus-related checks
|
||||
stella doctor --check "check.binaryanalysis.corpus.*"
|
||||
|
||||
# Specific checks
|
||||
stella doctor --check check.binaryanalysis.corpus.mirror.freshness
|
||||
stella doctor --check check.binaryanalysis.corpus.kpi.baseline
|
||||
stella doctor --check check.binaryanalysis.debuginfod.availability
|
||||
```
|
||||
|
||||
### Health Check Script
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# check-mirror-health.sh
|
||||
# Validates mirror freshness and connectivity
|
||||
|
||||
set -euo pipefail
|
||||
|
||||
MIRRORS_ROOT="${MIRRORS_ROOT:-/data/golden-corpus/mirrors}"
|
||||
STALE_THRESHOLD_DAYS=7
|
||||
ALERTS=""
|
||||
|
||||
check_mirror() {
|
||||
local mirror_name=$1
|
||||
local last_sync_file=$2
|
||||
local max_age=$3
|
||||
|
||||
if [[ ! -f "$last_sync_file" ]]; then
|
||||
ALERTS+="CRITICAL: $mirror_name has never been synced\n"
|
||||
return
|
||||
fi
|
||||
|
||||
local last_sync=$(cat "$last_sync_file")
|
||||
local last_sync_epoch=$(date -d "$last_sync" +%s)
|
||||
local now_epoch=$(date +%s)
|
||||
local age_days=$(( (now_epoch - last_sync_epoch) / 86400 ))
|
||||
|
||||
if [[ $age_days -gt $max_age ]]; then
|
||||
ALERTS+="WARNING: $mirror_name is $age_days days old (threshold: $max_age)\n"
|
||||
fi
|
||||
}
|
||||
|
||||
# Check each mirror
|
||||
check_mirror "Debian" "$MIRRORS_ROOT/debian/.last-sync" $STALE_THRESHOLD_DAYS
|
||||
check_mirror "Ubuntu" "$MIRRORS_ROOT/ubuntu/.last-sync" $STALE_THRESHOLD_DAYS
|
||||
check_mirror "Alpine" "$MIRRORS_ROOT/alpine/.last-sync" $STALE_THRESHOLD_DAYS
|
||||
check_mirror "OSV" "$MIRRORS_ROOT/osv/.last-sync" 1 # OSV should be hourly
|
||||
|
||||
# Check connectivity
|
||||
for url in \
|
||||
"https://snapshot.debian.org" \
|
||||
"https://buildinfos.debian.net" \
|
||||
"https://ubuntu.com/security/notices.json" \
|
||||
"https://secdb.alpinelinux.org"; do
|
||||
|
||||
if ! curl -sSf --connect-timeout 5 "$url" > /dev/null 2>&1; then
|
||||
ALERTS+="ERROR: Cannot reach $url\n"
|
||||
fi
|
||||
done
|
||||
|
||||
# Report results
|
||||
if [[ -n "$ALERTS" ]]; then
|
||||
echo -e "Golden Corpus Health Issues:\n$ALERTS"
|
||||
# Send alert (customize for your alerting system)
|
||||
# curl -X POST -d "$ALERTS" https://alerts.example.com/webhook
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "All mirrors healthy at $(date)"
|
||||
```
|
||||
|
||||
### Monitoring Metrics
|
||||
|
||||
Export these metrics to your monitoring system:
|
||||
|
||||
| Metric | Description | Alert Threshold |
|
||||
|--------|-------------|-----------------|
|
||||
| `corpus.mirrors.age_seconds` | Time since last mirror sync | > 7 days |
|
||||
| `corpus.pairs.total` | Total number of security pairs | N/A (info) |
|
||||
| `corpus.validation.precision` | Latest precision rate | < baseline - 0.01 |
|
||||
| `corpus.validation.recall` | Latest recall rate | < baseline - 0.01 |
|
||||
| `corpus.validation.determinism` | Deterministic replay rate | < 1.0 |
|
||||
| `corpus.bundle.count` | Number of evidence bundles | N/A (info) |
|
||||
| `corpus.baseline.age_days` | Days since baseline update | > 30 days |
|
||||
|
||||
### Prometheus Metrics Example
|
||||
|
||||
```yaml
|
||||
# prometheus-corpus-metrics.yaml
|
||||
groups:
|
||||
- name: golden-corpus
|
||||
rules:
|
||||
- alert: CorpusMirrorStale
|
||||
expr: corpus_mirror_age_seconds > 604800 # 7 days
|
||||
labels:
|
||||
severity: warning
|
||||
annotations:
|
||||
summary: "Corpus mirror {{ $labels.mirror }} is stale"
|
||||
|
||||
- alert: CorpusRegressionDetected
|
||||
expr: corpus_validation_precision < corpus_baseline_precision - 0.01
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Precision regression detected in golden corpus validation"
|
||||
|
||||
- alert: CorpusDeterminismFailure
|
||||
expr: corpus_validation_determinism < 1.0
|
||||
labels:
|
||||
severity: critical
|
||||
annotations:
|
||||
summary: "Non-deterministic replay detected"
|
||||
```
|
||||
|
||||
## Cleanup and Archival
|
||||
|
||||
### Archive Old Results
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# archive-old-results.sh
|
||||
# Archives results older than 90 days
|
||||
|
||||
RESULTS_DIR="/data/golden-corpus/bench/results"
|
||||
ARCHIVE_DIR="/data/golden-corpus/bench/archive"
|
||||
AGE_DAYS=90
|
||||
|
||||
mkdir -p "$ARCHIVE_DIR"
|
||||
|
||||
find "$RESULTS_DIR" -name "*.json" -mtime +$AGE_DAYS -exec \
|
||||
mv {} "$ARCHIVE_DIR/" \;
|
||||
|
||||
# Compress archived results by month
|
||||
cd "$ARCHIVE_DIR"
|
||||
for month in $(ls *.json | cut -c1-6 | sort -u); do
|
||||
tar -czf "results-$month.tar.gz" "${month}"*.json && \
|
||||
rm -f "${month}"*.json
|
||||
done
|
||||
```
|
||||
|
||||
### Prune Old Baselines
|
||||
|
||||
Keep only the last N baselines:
|
||||
|
||||
```bash
|
||||
#!/bin/bash
|
||||
# prune-baselines.sh
|
||||
# Keeps only the 10 most recent baseline archives
|
||||
|
||||
BASELINE_ARCHIVE="/data/golden-corpus/bench/baselines/archive"
|
||||
KEEP_COUNT=10
|
||||
|
||||
cd "$BASELINE_ARCHIVE"
|
||||
ls -t baseline-*.json | tail -n +$((KEEP_COUNT + 1)) | xargs -r rm -f
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Golden Corpus Folder Layout](golden-corpus-layout.md)
|
||||
- [Ground Truth Corpus Overview](ground-truth-corpus.md)
|
||||
- [Golden Corpus Operations Runbook](../../runbooks/golden-corpus-operations.md)
|
||||
@@ -23,10 +23,12 @@ The `stella` CLI is the operator-facing Swiss army knife for scans, exports, pol
|
||||
- Versioned command docs in `docs/modules/cli/guides`.
|
||||
- Plugin catalogue in `plugins/cli/**` (restart-only).
|
||||
|
||||
## Related resources
|
||||
- ./guides/20_REFERENCE.md
|
||||
- ./guides/cli-reference.md
|
||||
- ./guides/policy.md
|
||||
## Related resources
|
||||
- ./guides/20_REFERENCE.md
|
||||
- ./guides/cli-reference.md
|
||||
- ./guides/commands/analytics.md
|
||||
- ./guides/policy.md
|
||||
- ./guides/trust-profiles.md
|
||||
|
||||
## Backlog references
|
||||
- DOCS-CLI-OBS-52-001 / DOCS-CLI-FORENSICS-53-001 in ../../TASKS.md.
|
||||
|
||||
@@ -51,10 +51,11 @@ Status key:
|
||||
|
||||
| UI capability | CLI command(s) | Status | Notes / Tasks |
|
||||
|---------------|----------------|--------|---------------|
|
||||
| Advisory observations search | `stella vuln observations` | ✅ Available | Implemented via `BuildVulnCommand`. |
|
||||
| Advisory linkset export | `stella advisory linkset show/export` | 🟩 Planned | `CLI-LNM-22-001`. |
|
||||
| VEX observations / linksets | `stella vex obs get/linkset show` | 🟩 Planned | `CLI-LNM-22-002`. |
|
||||
| SBOM overlay export | `stella sbom overlay apply/export` | 🟩 Planned | Scoped to upcoming SBOM CLI sprint (`SBOM-CONSOLE-23-001/002` + CLI backlog). |
|
||||
| Advisory observations search | `stella vuln observations` | ✅ Available | Implemented via `BuildVulnCommand`. |
|
||||
| Advisory linkset export | `stella advisory linkset show/export` | 🟩 Planned | `CLI-LNM-22-001`. |
|
||||
| VEX observations / linksets | `stella vex obs get/linkset show` | 🟩 Planned | `CLI-LNM-22-002`. |
|
||||
| SBOM overlay export | `stella sbom overlay apply/export` | 🟩 Planned | Scoped to upcoming SBOM CLI sprint (`SBOM-CONSOLE-23-001/002` + CLI backlog). |
|
||||
| SBOM Lake analytics (`/analytics/sbom-lake`) | `stella analytics sbom-lake <subcommand>` | ✅ Available | CLI guide at `docs/modules/cli/guides/commands/analytics.md` (SPRINT_20260120_032). |
|
||||
|
||||
---
|
||||
|
||||
@@ -151,5 +152,5 @@ The script should emit a parity report that feeds into the Downloads workspace (
|
||||
|
||||
---
|
||||
|
||||
*Last updated: 2025-10-28 (Sprint 23).*
|
||||
*Last updated: 2026-01-20 (Sprint 20260120).*
|
||||
|
||||
|
||||
@@ -1,5 +1,5 @@
|
||||
version: 1
|
||||
generated: 2025-12-01T00:00:00Z
|
||||
generated: 2026-01-20T00:00:00Z
|
||||
compatibility:
|
||||
policy: "SemVer-like: commands/flags/exitCodes are backwards compatible within major version."
|
||||
deprecation:
|
||||
@@ -38,6 +38,108 @@ commands:
|
||||
0: success
|
||||
4: auth-misconfigured
|
||||
5: token-invalid
|
||||
- name: analytics
|
||||
subcommands:
|
||||
- name: sbom-lake
|
||||
subcommands:
|
||||
- name: suppliers
|
||||
formats: [table, json, csv]
|
||||
flags:
|
||||
- name: environment
|
||||
required: false
|
||||
- name: limit
|
||||
required: false
|
||||
- name: format
|
||||
required: false
|
||||
values: [table, json, csv]
|
||||
- name: output
|
||||
required: false
|
||||
exitCodes:
|
||||
0: success
|
||||
1: error
|
||||
- name: licenses
|
||||
formats: [table, json, csv]
|
||||
flags:
|
||||
- name: environment
|
||||
required: false
|
||||
- name: limit
|
||||
required: false
|
||||
- name: format
|
||||
required: false
|
||||
values: [table, json, csv]
|
||||
- name: output
|
||||
required: false
|
||||
exitCodes:
|
||||
0: success
|
||||
1: error
|
||||
- name: vulnerabilities
|
||||
formats: [table, json, csv]
|
||||
flags:
|
||||
- name: environment
|
||||
required: false
|
||||
- name: min-severity
|
||||
required: false
|
||||
values: [critical, high, medium, low]
|
||||
- name: limit
|
||||
required: false
|
||||
- name: format
|
||||
required: false
|
||||
values: [table, json, csv]
|
||||
- name: output
|
||||
required: false
|
||||
exitCodes:
|
||||
0: success
|
||||
1: error
|
||||
- name: backlog
|
||||
formats: [table, json, csv]
|
||||
flags:
|
||||
- name: environment
|
||||
required: false
|
||||
- name: limit
|
||||
required: false
|
||||
- name: format
|
||||
required: false
|
||||
values: [table, json, csv]
|
||||
- name: output
|
||||
required: false
|
||||
exitCodes:
|
||||
0: success
|
||||
1: error
|
||||
- name: attestation-coverage
|
||||
formats: [table, json, csv]
|
||||
flags:
|
||||
- name: environment
|
||||
required: false
|
||||
- name: limit
|
||||
required: false
|
||||
- name: format
|
||||
required: false
|
||||
values: [table, json, csv]
|
||||
- name: output
|
||||
required: false
|
||||
exitCodes:
|
||||
0: success
|
||||
1: error
|
||||
- name: trends
|
||||
formats: [table, json, csv]
|
||||
flags:
|
||||
- name: environment
|
||||
required: false
|
||||
- name: days
|
||||
required: false
|
||||
- name: series
|
||||
required: false
|
||||
values: [vulnerabilities, components, all]
|
||||
- name: limit
|
||||
required: false
|
||||
- name: format
|
||||
required: false
|
||||
values: [table, json, csv]
|
||||
- name: output
|
||||
required: false
|
||||
exitCodes:
|
||||
0: success
|
||||
1: error
|
||||
telemetry:
|
||||
defaultEnabled: false
|
||||
envVars:
|
||||
|
||||
47
docs/modules/cli/guides/commands/analytics.md
Normal file
47
docs/modules/cli/guides/commands/analytics.md
Normal file
@@ -0,0 +1,47 @@
|
||||
# stella analytics - Command Guide
|
||||
|
||||
## Commands
|
||||
- `stella analytics sbom-lake suppliers [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
|
||||
- `stella analytics sbom-lake licenses [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
|
||||
- `stella analytics sbom-lake vulnerabilities [--environment <env>] [--min-severity <level>] [--limit <n>] [--format table|json|csv] [--output <path>]`
|
||||
- `stella analytics sbom-lake backlog [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
|
||||
- `stella analytics sbom-lake attestation-coverage [--environment <env>] [--limit <n>] [--format table|json|csv] [--output <path>]`
|
||||
- `stella analytics sbom-lake trends [--environment <env>] [--days <n>] [--series vulnerabilities|components|all] [--limit <n>] [--format table|json|csv] [--output <path>]`
|
||||
|
||||
## Flags (common)
|
||||
- `--format`: Output format for rendering (`table`, `json`, `csv`).
|
||||
- `--output`: Write output to a file path instead of stdout.
|
||||
- `--limit`: Cap the number of rows returned.
|
||||
- `--environment`: Filter by environment name.
|
||||
|
||||
## SBOM lake notes
|
||||
- Endpoints require the `analytics.read` scope.
|
||||
- `--min-severity` accepts `critical`, `high`, `medium`, `low`.
|
||||
- `--series` controls trend output (`vulnerabilities`, `components`, `all`).
|
||||
- Tables use deterministic ordering (severity and counts first, then names).
|
||||
|
||||
## Examples
|
||||
|
||||
```bash
|
||||
# Top suppliers
|
||||
stella analytics sbom-lake suppliers --limit 20
|
||||
|
||||
# License distribution as CSV (prod)
|
||||
stella analytics sbom-lake licenses --environment prod --format csv --output licenses.csv
|
||||
|
||||
# Vulnerability exposure in prod (high+)
|
||||
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high
|
||||
|
||||
# Fixable backlog with table output
|
||||
stella analytics sbom-lake backlog --environment prod --limit 50
|
||||
|
||||
# Attestation coverage in staging, JSON output
|
||||
stella analytics sbom-lake attestation-coverage --environment stage --format json
|
||||
|
||||
# 30-day trend snapshot (both series)
|
||||
stella analytics sbom-lake trends --days 30 --series all --format csv --output trends.csv
|
||||
```
|
||||
|
||||
## Offline/verification note
|
||||
- If analytics exports arrive via offline bundles, verify the bundle first with
|
||||
`stella bundle verify` before importing data into downstream reports.
|
||||
@@ -16,6 +16,7 @@ graph TD
|
||||
CLI --> EXPLAIN[Explainability]
|
||||
CLI --> VEX[VEX & Decisioning]
|
||||
CLI --> SBOM[SBOM Operations]
|
||||
CLI --> ANALYTICS[Analytics & Insights]
|
||||
CLI --> REPORT[Reporting & Export]
|
||||
CLI --> OFFLINE[Offline Operations]
|
||||
CLI --> SYSTEM[System & Config]
|
||||
@@ -742,6 +743,601 @@ stella sbom merge --sbom <path1> --sbom <path2> [--output <path>] [--verbose]
|
||||
|
||||
---
|
||||
|
||||
## Analytics Commands
|
||||
|
||||
### stella analytics sbom-lake
|
||||
|
||||
Query SBOM lake analytics views (suppliers, licenses, vulnerabilities, backlog,
|
||||
attestation coverage, trends).
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella analytics sbom-lake <subcommand> [options]
|
||||
```
|
||||
|
||||
**Subcommands:**
|
||||
- `suppliers` - Supplier concentration
|
||||
- `licenses` - License distribution
|
||||
- `vulnerabilities` - CVE exposure (VEX-adjusted)
|
||||
- `backlog` - Fixable vulnerability backlog
|
||||
- `attestation-coverage` - Provenance/SLSA coverage
|
||||
- `trends` - Time-series trends (vulnerabilities/components)
|
||||
|
||||
**Common options:**
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--environment <env>` | Filter to a specific environment |
|
||||
| `--min-severity <level>` | Minimum severity (`critical`, `high`, `medium`, `low`) |
|
||||
| `--days <n>` | Lookback window in days (trends only) |
|
||||
| `--series <name>` | Trend series (`vulnerabilities`, `components`, `all`) |
|
||||
| `--limit <n>` | Maximum number of rows |
|
||||
| `--format <fmt>` | Output format: `table`, `json`, `csv` |
|
||||
| `--output <path>` | Output file path |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella analytics sbom-lake vulnerabilities --environment prod --min-severity high --format csv --output vuln.csv
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## Ground-Truth Corpus Commands
|
||||
|
||||
### stella groundtruth
|
||||
|
||||
Manage ground-truth corpus for patch-paired binary verification. The corpus supports
|
||||
precision validation of security advisories by maintaining symbol and binary pairs
|
||||
from upstream sources.
|
||||
|
||||
**Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth <subcommand> [options]
|
||||
```
|
||||
|
||||
**Subcommands:**
|
||||
- `sources` - Manage symbol source connectors
|
||||
- `symbols` - Query and search symbols in the corpus
|
||||
- `pairs` - Manage security pairs (vuln/patch binary pairs)
|
||||
- `validate` - Run validation and view metrics
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth sources
|
||||
|
||||
Manage upstream symbol source connectors.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth sources <command> [options]
|
||||
```
|
||||
|
||||
**Subcommands:**
|
||||
|
||||
#### stella groundtruth sources list
|
||||
|
||||
List available symbol source connectors.
|
||||
|
||||
```bash
|
||||
stella groundtruth sources list [--output-format table|json] [--verbose]
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
ID Display Name Status Last Sync
|
||||
------------------------------------------------------------------------------------------
|
||||
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
|
||||
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
|
||||
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
|
||||
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
|
||||
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
|
||||
```
|
||||
|
||||
#### stella groundtruth sources enable
|
||||
|
||||
Enable a symbol source connector.
|
||||
|
||||
```bash
|
||||
stella groundtruth sources enable <source> [--verbose]
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `<source>` - Source connector ID (e.g., `debuginfod-fedora`)
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth sources enable debuginfod-fedora
|
||||
```
|
||||
|
||||
#### stella groundtruth sources disable
|
||||
|
||||
Disable a symbol source connector.
|
||||
|
||||
```bash
|
||||
stella groundtruth sources disable <source> [--verbose]
|
||||
```
|
||||
|
||||
#### stella groundtruth sources sync
|
||||
|
||||
Synchronize symbol sources from upstream.
|
||||
|
||||
```bash
|
||||
stella groundtruth sources sync [--source <id>] [--full] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description |
|
||||
|--------|-------------|
|
||||
| `--source <id>` | Source connector ID (all if not specified) |
|
||||
| `--full` | Perform a full sync instead of incremental |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
# Incremental sync of all sources
|
||||
stella groundtruth sources sync
|
||||
|
||||
# Full sync of Debian buildinfo
|
||||
stella groundtruth sources sync --source buildinfo-debian --full
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth symbols
|
||||
|
||||
Query and search symbols in the corpus.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth symbols <command> [options]
|
||||
```
|
||||
|
||||
#### stella groundtruth symbols lookup
|
||||
|
||||
Lookup symbols by debug ID (build-id).
|
||||
|
||||
```bash
|
||||
stella groundtruth symbols lookup --debug-id <id> [--output-format table|json] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description | Required |
|
||||
|--------|-------|-------------|----------|
|
||||
| `--debug-id` | `-d` | Debug ID (build-id) to lookup | Yes |
|
||||
| `--output-format` | `-O` | Output format: `table`, `json` | No |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
|
||||
```
|
||||
|
||||
**Output (table):**
|
||||
```
|
||||
Binary: libcrypto.so.3
|
||||
Architecture: x86_64
|
||||
Distribution: debian-bookworm
|
||||
Package: openssl@3.0.11-1
|
||||
Symbol Count: 4523
|
||||
Sources: debuginfod-fedora, buildinfo-debian
|
||||
```
|
||||
|
||||
#### stella groundtruth symbols search
|
||||
|
||||
Search symbols by package or distribution.
|
||||
|
||||
```bash
|
||||
stella groundtruth symbols search [--package <name>] [--distro <distro>] [--limit <n>] [--output-format table|json] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description | Default |
|
||||
|--------|-------|-------------|---------|
|
||||
| `--package` | `-p` | Package name to search for | - |
|
||||
| `--distro` | | Distribution filter (debian, ubuntu, alpine) | - |
|
||||
| `--limit` | `-l` | Maximum results | 20 |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth symbols search --package openssl --distro debian --limit 50
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth pairs
|
||||
|
||||
Manage security pairs (vulnerable/patched binary pairs) in the corpus.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth pairs <command> [options]
|
||||
```
|
||||
|
||||
#### stella groundtruth pairs create
|
||||
|
||||
Create a new security pair.
|
||||
|
||||
```bash
|
||||
stella groundtruth pairs create --cve <cve-id> --vuln-pkg <pkg=ver> --patch-pkg <pkg=ver> [--distro <distro>] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description | Required |
|
||||
|--------|-------------|----------|
|
||||
| `--cve` | CVE identifier | Yes |
|
||||
| `--vuln-pkg` | Vulnerable package (name=version) | Yes |
|
||||
| `--patch-pkg` | Patched package (name=version) | Yes |
|
||||
| `--distro` | Distribution (e.g., `debian-bookworm`) | No |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth pairs create \
|
||||
--cve CVE-2024-1234 \
|
||||
--vuln-pkg openssl=3.0.10-1 \
|
||||
--patch-pkg openssl=3.0.11-1 \
|
||||
--distro debian-bookworm
|
||||
```
|
||||
|
||||
#### stella groundtruth pairs list
|
||||
|
||||
List security pairs in the corpus.
|
||||
|
||||
```bash
|
||||
stella groundtruth pairs list [--cve <pattern>] [--package <name>] [--limit <n>] [--output-format table|json] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description | Default |
|
||||
|--------|-------|-------------|---------|
|
||||
| `--cve` | | Filter by CVE (supports wildcards: `CVE-2024-*`) | - |
|
||||
| `--package` | `-p` | Filter by package name | - |
|
||||
| `--limit` | `-l` | Maximum results | 50 |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth pairs list --cve CVE-2024-* --package openssl --limit 100
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
Pair ID CVE Package Vuln Version Patch Version
|
||||
-------------------------------------------------------------------------------
|
||||
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
|
||||
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
|
||||
```
|
||||
|
||||
#### stella groundtruth pairs delete
|
||||
|
||||
Delete a security pair from the corpus.
|
||||
|
||||
```bash
|
||||
stella groundtruth pairs delete <pair-id> [--force] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description |
|
||||
|--------|-------|-------------|
|
||||
| `--force` | `-f` | Skip confirmation prompt |
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth validate
|
||||
|
||||
Run validation harness against security pairs.
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth validate <command> [options]
|
||||
```
|
||||
|
||||
#### stella groundtruth validate run
|
||||
|
||||
Run validation on security pairs.
|
||||
|
||||
```bash
|
||||
stella groundtruth validate run [--pairs <pattern>] [--matcher <type>] [--output <path>] [--parallel <n>] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description | Default |
|
||||
|--------|-------|-------------|---------|
|
||||
| `--pairs` | `-p` | Pair filter pattern (e.g., `openssl:CVE-2024-*`) | all |
|
||||
| `--matcher` | `-m` | Matcher type: `semantic-diffing`, `hash-based`, `hybrid` | `semantic-diffing` |
|
||||
| `--output` | `-o` | Output file for validation report | - |
|
||||
| `--parallel` | | Maximum parallel validations | 4 |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth validate run \
|
||||
--pairs "openssl:CVE-2024-*" \
|
||||
--matcher semantic-diffing \
|
||||
--parallel 8 \
|
||||
--output validation-report.md
|
||||
```
|
||||
|
||||
**Output:**
|
||||
```
|
||||
Validating pairs: 10/10
|
||||
Validation complete. Run ID: vr-20260122100532
|
||||
Function Match Rate: 94.2%
|
||||
False-Negative Rate: 2.1%
|
||||
SBOM Hash Stability: 3/3
|
||||
Report written to: validation-report.md
|
||||
```
|
||||
|
||||
#### stella groundtruth validate metrics
|
||||
|
||||
View metrics for a validation run.
|
||||
|
||||
```bash
|
||||
stella groundtruth validate metrics --run-id <id> [--output-format table|json] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description | Required |
|
||||
|--------|-------|-------------|----------|
|
||||
| `--run-id` | `-r` | Validation run ID | Yes |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
|
||||
```
|
||||
|
||||
**Output (table):**
|
||||
```
|
||||
Run ID: vr-20260122100532
|
||||
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
|
||||
Pairs: 48/50 successful
|
||||
Function Match Rate: 94.2%
|
||||
False-Negative Rate: 2.1%
|
||||
SBOM Hash Stability: 3/3
|
||||
Verify Time (p50/p95): 423ms / 1.2s
|
||||
```
|
||||
|
||||
#### stella groundtruth validate export
|
||||
|
||||
Export validation report.
|
||||
|
||||
```bash
|
||||
stella groundtruth validate export --run-id <id> --output <path> [--format <fmt>] [--verbose]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Alias | Description | Default |
|
||||
|--------|-------|-------------|---------|
|
||||
| `--run-id` | `-r` | Validation run ID | (required) |
|
||||
| `--output` | `-o` | Output file path | (required) |
|
||||
| `--format` | `-f` | Export format: `markdown`, `html`, `json` | `markdown` |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth validate export \
|
||||
--run-id vr-20260122100532 \
|
||||
--format markdown \
|
||||
--output validation-report.md
|
||||
```
|
||||
|
||||
**See Also:** [Ground-Truth CLI Guide](../ground-truth-cli.md)
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth bundle
|
||||
|
||||
Manage evidence bundles for offline verification of patch provenance.
|
||||
|
||||
**Sprint:** SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth bundle <command> [options]
|
||||
```
|
||||
|
||||
**Subcommands:**
|
||||
- `export` - Create evidence bundles for air-gapped environments
|
||||
- `import` - Import and verify evidence bundles
|
||||
|
||||
#### stella groundtruth bundle export
|
||||
|
||||
Export evidence bundles containing pre/post binaries, SBOMs, delta-sig predicates, and timestamps.
|
||||
|
||||
```bash
|
||||
stella groundtruth bundle export [options]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description | Required |
|
||||
|--------|-------------|----------|
|
||||
| `--packages <list>` | Comma-separated package names (e.g., `openssl,curl`) | Yes |
|
||||
| `--distros <list>` | Comma-separated distributions (e.g., `debian,ubuntu`) | Yes |
|
||||
| `--output <path>` | Output bundle path (.tar.gz or .oci.tar) | Yes |
|
||||
| `--sign-with <signer>` | Signing method: `cosign`, `sigstore`, `none` | No |
|
||||
| `--include-debug` | Include debug symbols | No |
|
||||
| `--include-kpis` | Include KPI validation results | No |
|
||||
| `--include-timestamps` | Include RFC 3161 timestamps | No |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth bundle export \
|
||||
--packages openssl,zlib,glibc \
|
||||
--distros debian,fedora \
|
||||
--output evidence/security-bundle.tar.gz \
|
||||
--sign-with cosign \
|
||||
--include-debug \
|
||||
--include-kpis \
|
||||
--include-timestamps
|
||||
```
|
||||
|
||||
**Exit Codes:**
|
||||
- `0` - Bundle created successfully
|
||||
- `1` - Bundle creation failed
|
||||
- `2` - Invalid input or configuration error
|
||||
|
||||
#### stella groundtruth bundle import
|
||||
|
||||
Import and verify evidence bundles in air-gapped environments.
|
||||
|
||||
```bash
|
||||
stella groundtruth bundle import [options]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description | Required |
|
||||
|--------|-------------|----------|
|
||||
| `--input <path>` | Input bundle path | Yes |
|
||||
| `--verify-signature` | Verify bundle signatures | No |
|
||||
| `--trusted-keys <path>` | Path to trusted public keys | No |
|
||||
| `--trust-profile <path>` | Trust profile for verification | No |
|
||||
| `--output <path>` | Output verification report | No |
|
||||
| `--format <fmt>` | Report format: `markdown`, `json`, `html` | No |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth bundle import \
|
||||
--input symbol-bundle.tar.gz \
|
||||
--verify-signature \
|
||||
--trusted-keys /etc/stellaops/trusted-keys.pub \
|
||||
--trust-profile /etc/stellaops/trust-profiles/global.json \
|
||||
--output verification-report.md
|
||||
```
|
||||
|
||||
**Verification Steps:**
|
||||
1. Validate bundle manifest signature
|
||||
2. Verify all blob digests match manifest
|
||||
3. Validate DSSE envelope signatures against trusted keys
|
||||
4. Verify RFC 3161 timestamps against trusted TSA certificates
|
||||
5. Run IR matcher to confirm patched functions
|
||||
6. Verify SBOM canonical hash matches signed predicate
|
||||
7. Output verification report with KPI line items
|
||||
|
||||
**Exit Codes:**
|
||||
- `0` - All verifications passed
|
||||
- `1` - One or more verifications failed
|
||||
- `2` - Invalid input or configuration error
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth validate check
|
||||
|
||||
Check KPI regression against baseline thresholds.
|
||||
|
||||
**Sprint:** SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
|
||||
|
||||
```bash
|
||||
stella groundtruth validate check [options]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--results <path>` | Path to validation results JSON | (required) |
|
||||
| `--baseline <path>` | Path to baseline JSON | (required) |
|
||||
| `--precision-threshold <pp>` | Max precision drop (percentage points) | 0.01 |
|
||||
| `--recall-threshold <pp>` | Max recall drop (percentage points) | 0.01 |
|
||||
| `--fn-rate-threshold <pp>` | Max FN rate increase (percentage points) | 0.01 |
|
||||
| `--determinism-threshold <rate>` | Min determinism rate | 1.0 |
|
||||
| `--ttfrp-threshold <pct>` | Max TTFRP p95 increase (percentage) | 0.20 |
|
||||
| `--output <path>` | Output report path | stdout |
|
||||
| `--format <fmt>` | Report format: `markdown`, `json` | `markdown` |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth validate check \
|
||||
--results bench/results/20260122.json \
|
||||
--baseline bench/baselines/current.json \
|
||||
--precision-threshold 0.01 \
|
||||
--recall-threshold 0.01 \
|
||||
--fn-rate-threshold 0.01 \
|
||||
--determinism-threshold 1.0 \
|
||||
--output regression-report.md
|
||||
```
|
||||
|
||||
**Regression Gates:**
|
||||
| Metric | Threshold | Action |
|
||||
|--------|-----------|--------|
|
||||
| Precision | Drops > threshold | Fail |
|
||||
| Recall | Drops > threshold | Fail |
|
||||
| False-negative rate | Increases > threshold | Fail |
|
||||
| Deterministic replay | Drops below threshold | Fail |
|
||||
| TTFRP p95 | Increases > threshold | Warn |
|
||||
|
||||
**Exit Codes:**
|
||||
- `0` - All gates passed
|
||||
- `1` - One or more gates failed
|
||||
- `2` - Invalid input or configuration error
|
||||
|
||||
---
|
||||
|
||||
### stella groundtruth baseline
|
||||
|
||||
Manage KPI baselines for regression detection.
|
||||
|
||||
**Sprint:** SPRINT_20260121_036_BinaryIndex_golden_corpus_bundle_verification
|
||||
|
||||
**Usage:**
|
||||
```bash
|
||||
stella groundtruth baseline <command> [options]
|
||||
```
|
||||
|
||||
**Subcommands:**
|
||||
- `update` - Update baseline from validation results
|
||||
- `show` - Display baseline contents
|
||||
|
||||
#### stella groundtruth baseline update
|
||||
|
||||
Update baseline from validation results.
|
||||
|
||||
```bash
|
||||
stella groundtruth baseline update [options]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description | Required |
|
||||
|--------|-------------|----------|
|
||||
| `--from-results <path>` | Path to validation results JSON | Yes |
|
||||
| `--output <path>` | Output baseline path | Yes |
|
||||
| `--description <text>` | Description for the baseline update | No |
|
||||
| `--source <commit>` | Source commit SHA for traceability | No |
|
||||
|
||||
**Example:**
|
||||
```bash
|
||||
stella groundtruth baseline update \
|
||||
--from-results bench/results/20260122.json \
|
||||
--output bench/baselines/current.json \
|
||||
--description "Post algorithm-v2.3 update" \
|
||||
--source "$(git rev-parse HEAD)"
|
||||
```
|
||||
|
||||
#### stella groundtruth baseline show
|
||||
|
||||
Display baseline contents.
|
||||
|
||||
```bash
|
||||
stella groundtruth baseline show --baseline <path> [--format table|json]
|
||||
```
|
||||
|
||||
**Options:**
|
||||
| Option | Description | Default |
|
||||
|--------|-------------|---------|
|
||||
| `--baseline <path>` | Path to baseline JSON | (required) |
|
||||
| `--format` | Output format: `table`, `json` | `table` |
|
||||
|
||||
**Output (table):**
|
||||
```
|
||||
Baseline ID: baseline-20260122120000
|
||||
Created: 2026-01-22T12:00:00Z
|
||||
Source: abc123def456
|
||||
Description: Post-semantic-diffing-v2 baseline
|
||||
|
||||
KPIs:
|
||||
Precision: 0.9500
|
||||
Recall: 0.9200
|
||||
False Negative Rate: 0.0800
|
||||
Determinism: 1.0000
|
||||
TTFRP p95: 150ms
|
||||
```
|
||||
|
||||
**See Also:** [Ground-Truth CLI Guide](../ground-truth-cli.md)
|
||||
|
||||
---
|
||||
## Reporting & Export Commands
|
||||
|
||||
### stella report
|
||||
|
||||
351
docs/modules/cli/guides/ground-truth-cli.md
Normal file
351
docs/modules/cli/guides/ground-truth-cli.md
Normal file
@@ -0,0 +1,351 @@
|
||||
# Ground-Truth Corpus CLI Guide
|
||||
|
||||
**Sprint:** SPRINT_20260121_035_BinaryIndex_golden_corpus_connectors_cli
|
||||
|
||||
## Overview
|
||||
|
||||
The `stella groundtruth` command group provides CLI access to the ground-truth corpus for patch-paired binary verification. This corpus enables precision validation of security advisories by maintaining symbol and binary pairs from upstream distribution sources.
|
||||
|
||||
## Use Cases
|
||||
|
||||
- **Security teams**: Validate patch presence in production binaries
|
||||
- **Compliance auditors**: Generate evidence bundles for air-gapped verification
|
||||
- **DevSecOps**: Integrate corpus validation into CI/CD pipelines
|
||||
- **Researchers**: Query symbol databases for vulnerability analysis
|
||||
|
||||
## Prerequisites
|
||||
|
||||
- Stella CLI installed and configured
|
||||
- Backend connectivity to Platform service (or offline bundle)
|
||||
- For sync operations: network access to upstream sources
|
||||
|
||||
## Command Structure
|
||||
|
||||
```
|
||||
stella groundtruth
|
||||
├── sources # Manage symbol source connectors
|
||||
│ ├── list # List available connectors
|
||||
│ ├── enable # Enable a connector
|
||||
│ ├── disable # Disable a connector
|
||||
│ └── sync # Sync from upstream
|
||||
├── symbols # Query symbols in corpus
|
||||
│ ├── lookup # Lookup by debug ID
|
||||
│ └── search # Search by package/distro
|
||||
├── pairs # Manage security pairs
|
||||
│ ├── create # Create vuln/patch pair
|
||||
│ ├── list # List existing pairs
|
||||
│ └── delete # Remove a pair
|
||||
└── validate # Run validation harness
|
||||
├── run # Execute validation
|
||||
├── metrics # View run metrics
|
||||
└── export # Export report
|
||||
```
|
||||
|
||||
## Source Connectors
|
||||
|
||||
The ground-truth corpus ingests data from multiple upstream sources:
|
||||
|
||||
| Connector ID | Distribution | Data Type | Description |
|
||||
|--------------|--------------|-----------|-------------|
|
||||
| `debuginfod-fedora` | Fedora | Debug symbols | ELF debuginfo via debuginfod protocol |
|
||||
| `debuginfod-ubuntu` | Ubuntu | Debug symbols | ELF debuginfo via debuginfod protocol |
|
||||
| `ddeb-ubuntu` | Ubuntu | Debug packages | `.ddeb` debug symbol packages |
|
||||
| `buildinfo-debian` | Debian | Build metadata | `.buildinfo` reproducibility records |
|
||||
| `secdb-alpine` | Alpine | Security DB | `secfixes` YAML from APKBUILD |
|
||||
|
||||
### List Sources
|
||||
|
||||
```bash
|
||||
stella groundtruth sources list
|
||||
|
||||
# Output:
|
||||
ID Display Name Status Last Sync
|
||||
------------------------------------------------------------------------------------------
|
||||
debuginfod-fedora Fedora Debuginfod Enabled 2026-01-22T10:00:00Z
|
||||
debuginfod-ubuntu Ubuntu Debuginfod Enabled 2026-01-22T10:00:00Z
|
||||
ddeb-ubuntu Ubuntu ddebs Enabled 2026-01-22T09:30:00Z
|
||||
buildinfo-debian Debian Buildinfo Enabled 2026-01-22T08:00:00Z
|
||||
secdb-alpine Alpine SecDB Enabled 2026-01-22T06:00:00Z
|
||||
```
|
||||
|
||||
### Enable/Disable Sources
|
||||
|
||||
```bash
|
||||
# Enable a source connector
|
||||
stella groundtruth sources enable debuginfod-fedora
|
||||
|
||||
# Disable a source connector (stops future syncs)
|
||||
stella groundtruth sources disable debuginfod-fedora
|
||||
```
|
||||
|
||||
### Sync Sources
|
||||
|
||||
```bash
|
||||
# Incremental sync of all enabled sources
|
||||
stella groundtruth sources sync
|
||||
|
||||
# Full sync of a specific source
|
||||
stella groundtruth sources sync --source buildinfo-debian --full
|
||||
|
||||
# Sync with verbose output
|
||||
stella groundtruth sources sync --source ddeb-ubuntu -v
|
||||
```
|
||||
|
||||
## Symbol Operations
|
||||
|
||||
### Lookup by Debug ID
|
||||
|
||||
Query symbols using the ELF GNU Build-ID or equivalent identifier:
|
||||
|
||||
```bash
|
||||
# Lookup by build-id
|
||||
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a
|
||||
|
||||
# JSON output
|
||||
stella groundtruth symbols lookup --debug-id 7f8a9b2c4d5e6f1a --output-format json
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
```
|
||||
Binary: libcrypto.so.3
|
||||
Architecture: x86_64
|
||||
Distribution: debian-bookworm
|
||||
Package: openssl@3.0.11-1
|
||||
Symbol Count: 4523
|
||||
Sources: debuginfod-fedora, buildinfo-debian
|
||||
```
|
||||
|
||||
### Search Symbols
|
||||
|
||||
Search across the corpus by package name or distribution:
|
||||
|
||||
```bash
|
||||
# Search by package
|
||||
stella groundtruth symbols search --package openssl
|
||||
|
||||
# Filter by distribution
|
||||
stella groundtruth symbols search --package openssl --distro debian
|
||||
|
||||
# Limit results
|
||||
stella groundtruth symbols search --package curl --limit 100
|
||||
```
|
||||
|
||||
## Security Pairs
|
||||
|
||||
Security pairs link vulnerable and patched binary versions for a specific CVE.
|
||||
|
||||
### Create a Pair
|
||||
|
||||
```bash
|
||||
stella groundtruth pairs create \
|
||||
--cve CVE-2024-1234 \
|
||||
--vuln-pkg openssl=3.0.10-1 \
|
||||
--patch-pkg openssl=3.0.11-1 \
|
||||
--distro debian-bookworm
|
||||
```
|
||||
|
||||
### List Pairs
|
||||
|
||||
```bash
|
||||
# List all pairs
|
||||
stella groundtruth pairs list
|
||||
|
||||
# Filter by CVE pattern
|
||||
stella groundtruth pairs list --cve "CVE-2024-*"
|
||||
|
||||
# Filter by package
|
||||
stella groundtruth pairs list --package openssl --limit 50
|
||||
|
||||
# JSON output
|
||||
stella groundtruth pairs list --output-format json
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
```
|
||||
Pair ID CVE Package Vuln Version Patch Version
|
||||
-------------------------------------------------------------------------------
|
||||
pair-001 CVE-2024-1234 openssl 3.0.10-1 3.0.11-1
|
||||
pair-002 CVE-2024-5678 curl 8.4.0-1 8.5.0-1
|
||||
```
|
||||
|
||||
### Delete a Pair
|
||||
|
||||
```bash
|
||||
# Delete with confirmation prompt
|
||||
stella groundtruth pairs delete pair-001
|
||||
|
||||
# Skip confirmation
|
||||
stella groundtruth pairs delete pair-001 --force
|
||||
```
|
||||
|
||||
## Validation Harness
|
||||
|
||||
The validation harness runs end-to-end verification against security pairs.
|
||||
|
||||
### Run Validation
|
||||
|
||||
```bash
|
||||
# Validate all pairs
|
||||
stella groundtruth validate run
|
||||
|
||||
# Validate specific pairs (pattern match)
|
||||
stella groundtruth validate run --pairs "openssl:CVE-2024-*"
|
||||
|
||||
# Use specific matcher
|
||||
stella groundtruth validate run --matcher semantic-diffing
|
||||
|
||||
# Parallel validation with report output
|
||||
stella groundtruth validate run \
|
||||
--pairs "curl:*" \
|
||||
--parallel 8 \
|
||||
--output validation-report.md
|
||||
```
|
||||
|
||||
**Matcher types:**
|
||||
| Matcher | Description |
|
||||
|---------|-------------|
|
||||
| `semantic-diffing` | IR-level semantic comparison (default) |
|
||||
| `hash-based` | Function hash matching |
|
||||
| `hybrid` | Combined semantic + hash approach |
|
||||
|
||||
### View Metrics
|
||||
|
||||
```bash
|
||||
stella groundtruth validate metrics --run-id vr-20260122100532
|
||||
|
||||
# JSON output
|
||||
stella groundtruth validate metrics --run-id vr-20260122100532 --output-format json
|
||||
```
|
||||
|
||||
**Example output:**
|
||||
```
|
||||
Run ID: vr-20260122100532
|
||||
Duration: 2026-01-22T10:00:00Z - 2026-01-22T10:15:32Z
|
||||
Pairs: 48/50 successful
|
||||
Function Match Rate: 94.2%
|
||||
False-Negative Rate: 2.1%
|
||||
SBOM Hash Stability: 3/3
|
||||
Verify Time (p50/p95): 423ms / 1.2s
|
||||
```
|
||||
|
||||
### Export Reports
|
||||
|
||||
```bash
|
||||
# Export as Markdown
|
||||
stella groundtruth validate export \
|
||||
--run-id vr-20260122100532 \
|
||||
--format markdown \
|
||||
--output report.md
|
||||
|
||||
# Export as HTML
|
||||
stella groundtruth validate export \
|
||||
--run-id vr-20260122100532 \
|
||||
--format html \
|
||||
--output report.html
|
||||
|
||||
# Export as JSON (machine-readable)
|
||||
stella groundtruth validate export \
|
||||
--run-id vr-20260122100532 \
|
||||
--format json \
|
||||
--output report.json
|
||||
```
|
||||
|
||||
## CI/CD Integration
|
||||
|
||||
### GitHub Actions Example
|
||||
|
||||
```yaml
|
||||
name: Corpus Validation
|
||||
on:
|
||||
schedule:
|
||||
- cron: '0 6 * * 1' # Weekly on Monday
|
||||
|
||||
jobs:
|
||||
validate:
|
||||
runs-on: ubuntu-latest
|
||||
steps:
|
||||
- name: Sync corpus sources
|
||||
run: stella groundtruth sources sync
|
||||
|
||||
- name: Run validation
|
||||
run: |
|
||||
stella groundtruth validate run \
|
||||
--matcher semantic-diffing \
|
||||
--parallel 4 \
|
||||
--output validation-${{ github.run_id }}.md
|
||||
|
||||
- name: Check metrics
|
||||
run: |
|
||||
MATCH_RATE=$(stella groundtruth validate metrics --run-id $(cat run-id.txt) --output-format json | jq '.functionMatchRate')
|
||||
if (( $(echo "$MATCH_RATE < 90" | bc -l) )); then
|
||||
echo "Match rate below threshold: $MATCH_RATE%"
|
||||
exit 1
|
||||
fi
|
||||
```
|
||||
|
||||
### GitLab CI Example
|
||||
|
||||
```yaml
|
||||
corpus-validation:
|
||||
stage: verify
|
||||
script:
|
||||
- stella groundtruth sources sync --source buildinfo-debian
|
||||
- stella groundtruth validate run --pairs "openssl:*" --output report.md
|
||||
artifacts:
|
||||
paths:
|
||||
- report.md
|
||||
expire_in: 1 week
|
||||
rules:
|
||||
- if: $CI_PIPELINE_SOURCE == "schedule"
|
||||
```
|
||||
|
||||
## Offline Usage
|
||||
|
||||
For air-gapped environments, use offline bundles:
|
||||
|
||||
```bash
|
||||
# Export corpus for offline use
|
||||
stella bundle export \
|
||||
--include-corpus \
|
||||
--output corpus-bundle-$(date +%F).tar.gz
|
||||
|
||||
# Import on air-gapped system
|
||||
stella bundle import --package corpus-bundle-2026-01-22.tar.gz
|
||||
|
||||
# Run validation offline
|
||||
stella groundtruth validate run --offline
|
||||
```
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Common Issues
|
||||
|
||||
**Sync fails with network error:**
|
||||
```bash
|
||||
# Check source status
|
||||
stella groundtruth sources list
|
||||
|
||||
# Retry with verbose output
|
||||
stella groundtruth sources sync --source debuginfod-ubuntu -v
|
||||
```
|
||||
|
||||
**Symbol lookup returns no results:**
|
||||
```bash
|
||||
# Verify debug-id format (hex string)
|
||||
stella groundtruth symbols lookup --debug-id abc123 -v
|
||||
|
||||
# Try searching by package instead
|
||||
stella groundtruth symbols search --package libcrypto
|
||||
```
|
||||
|
||||
**Validation metrics show low match rate:**
|
||||
- Check that both vuln and patch binaries are present in corpus
|
||||
- Verify symbol sources are synced and enabled
|
||||
- Consider using `hybrid` matcher for complex cases
|
||||
|
||||
## See Also
|
||||
|
||||
- [CLI Command Reference](commands/reference.md#ground-truth-corpus-commands)
|
||||
- [BinaryIndex Architecture](../../binary-index/architecture.md)
|
||||
- [Golden Corpus KPIs](../../benchmarks/golden-corpus-kpis.md)
|
||||
- [Air-Gap Bundle Guide](../../modules/airgap/README.md)
|
||||
36
docs/modules/cli/guides/trust-profiles.md
Normal file
36
docs/modules/cli/guides/trust-profiles.md
Normal file
@@ -0,0 +1,36 @@
|
||||
# Trust Profiles
|
||||
|
||||
Trust profiles are offline trust-store templates for bundle verification. They define trust roots, Rekor public keys, and TSA roots in a single file so operators can apply a profile into a local trust store.
|
||||
|
||||
Default profile location:
|
||||
- `etc/trust-profiles/*.trustprofile.json`
|
||||
- Assets referenced by profiles live under `etc/trust-profiles/assets/`
|
||||
|
||||
Profile structure (summary):
|
||||
- `profileId`: stable identifier (used by CLI commands)
|
||||
- `trustRoots[]`: signing trust roots (PEM files)
|
||||
- `rekorKeys[]`: Rekor public keys for offline inclusion proof verification
|
||||
- `tsaRoots[]`: TSA roots for RFC3161 verification
|
||||
- `metadata`: optional compliance metadata
|
||||
|
||||
CLI usage:
|
||||
- `stella trust-profile list`
|
||||
- `stella trust-profile show <profile-id>`
|
||||
- `stella trust-profile apply <profile-id> --output <dir>`
|
||||
|
||||
Profile lookup overrides:
|
||||
- `--profiles-dir <path>` to point at a custom profiles directory
|
||||
- `STELLAOPS_TRUST_PROFILES` environment variable for default lookup
|
||||
|
||||
Apply output:
|
||||
- `trust-manifest.json` (trust roots manifest for offline verification)
|
||||
- `trust-profile.json` (resolved profile copy)
|
||||
- `trust-root.pem` (combined trust roots for CLI verification)
|
||||
- `trust-roots/`, `rekor/`, `tsa/` folders with PEM assets
|
||||
|
||||
Example apply workflow:
|
||||
1. `stella trust-profile apply global --output ./trust-store`
|
||||
2. `stella bundle verify --trust-root ./trust-store/trust-root.pem`
|
||||
|
||||
Note:
|
||||
- Default profiles ship with placeholder roots for scaffolding only. Replace them with compliance-approved roots before production use.
|
||||
@@ -10,18 +10,68 @@ The SBOM Learning API enables Concelier to learn which advisories are relevant t
|
||||
Concelier normalizes incoming CycloneDX 1.7 and SPDX 3.0.1 documents into the internal `ParsedSbom` model for matching and downstream analysis.
|
||||
|
||||
Current extraction coverage (SPRINT_20260119_015):
|
||||
- Document metadata: format, specVersion, serialNumber, created, name, namespace when present
|
||||
- Components: bomRef, type, name, version, purl, cpe, hashes (including SPDX verifiedUsing), license IDs/expressions, license text (base64 decode), external references, properties, scope/modified, supplier/manufacturer, evidence, pedigree, cryptoProperties, modelCard (CycloneDX)
|
||||
- Dependencies: component dependency edges (CycloneDX dependencies, SPDX relationships)
|
||||
- Document metadata: format, specVersion, serialNumber, created, name, profiles, sbomType, namespace/imports
|
||||
- Components: bomRef, type, name, version, purl, cpe, hashes (including SPDX verifiedUsing), license IDs/expressions, license text (base64 decode), external references, properties, scope/modified, supplier/manufacturer, evidence, pedigree, cryptoProperties, modelCard (CycloneDX), swid (CycloneDX), SPDX AI model parameters, SPDX dataset metadata, SPDX file/snippet properties
|
||||
- Licensing: SPDX Licensing profile elements (listed/custom licenses, license additions, AND/OR/WITH/or-later operators), with OSI/FSF flags and deprecated IDs captured
|
||||
- Dependencies: component dependency edges (CycloneDX dependencies, SPDX relationships; DependencyOf is inverted to DependsOn)
|
||||
- Vulnerabilities: CycloneDX embedded vulnerabilities (ratings, affects, VEX analysis), SPDX Security profile vulnerabilities + VEX assessments
|
||||
- Services: endpoints, authentication, crossesTrustBoundary, data flows, licenses, external references (CycloneDX)
|
||||
- Formulation: components, workflows, tasks, properties (CycloneDX)
|
||||
- Declarations/definitions: attestations, affirmations, standards, signatures (CycloneDX)
|
||||
- Compositions/annotations (CycloneDX)
|
||||
- Build metadata: buildId, buildType, timestamps, config source, environment, parameters (SPDX)
|
||||
- Document properties
|
||||
|
||||
Notes:
|
||||
- Full SPDX Licensing profile objects, vulnerabilities, and other SPDX profiles are pending in SPRINT_20260119_015.
|
||||
- License expressions can be validated against embedded SPDX license/exception lists via `ILicenseExpressionValidator`.
|
||||
- Matching currently uses PURL and CPE; additional fields are stored for downstream consumers.
|
||||
|
||||
## VEX consumption
|
||||
When SBOM vulnerabilities include embedded VEX analysis, Concelier consumes the statements
|
||||
to filter or annotate advisory matches. NotAffected statements can be filtered when policy
|
||||
allows, and trust evaluation checks timestamps, signatures (when provided), and justification
|
||||
requirements for not-affected claims.
|
||||
|
||||
Configuration (YAML or JSON), loaded from `Concelier:VexConsumption:PolicyPath`:
|
||||
|
||||
```yaml
|
||||
vexConsumptionPolicy:
|
||||
trustEmbeddedVex: true
|
||||
minimumTrustLevel: Unverified
|
||||
filterNotAffected: true
|
||||
|
||||
signatureRequirements:
|
||||
requireSignedVex: false
|
||||
trustedSigners:
|
||||
- "https://example.com/keys/vex-signer"
|
||||
|
||||
timestampRequirements:
|
||||
maxAgeHours: 720
|
||||
requireTimestamp: true
|
||||
|
||||
conflictResolution:
|
||||
strategy: mostRecent
|
||||
logConflicts: true
|
||||
|
||||
mergePolicy:
|
||||
mode: union
|
||||
externalSources:
|
||||
- type: repository
|
||||
url: "https://vex.example.com/api"
|
||||
|
||||
justificationRequirements:
|
||||
requireJustificationForNotAffected: true
|
||||
acceptedJustifications:
|
||||
- component_not_present
|
||||
- vulnerable_code_not_present
|
||||
- vulnerable_code_not_in_execute_path
|
||||
- inline_mitigations_already_exist
|
||||
```
|
||||
|
||||
Reports are emitted via `VexConsumptionReporter` in JSON, SARIF, and text formats.
|
||||
Runtime overrides can be supplied via `Concelier:VexConsumption` (Enabled, IgnoreVex,
|
||||
PolicyPath, TrustEmbeddedVex, MinimumTrustLevel, FilterNotAffected, ExternalVexSources).
|
||||
|
||||
## Flow
|
||||
|
||||
```
|
||||
@@ -339,23 +389,51 @@ var affected = await sbomService.GetAffectedAdvisoriesAsync(
|
||||
```sql
|
||||
CREATE TABLE vuln.sbom_registry (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
tenant_id UUID NOT NULL,
|
||||
artifact_id TEXT NOT NULL,
|
||||
sbom_digest TEXT NOT NULL,
|
||||
sbom_format TEXT NOT NULL,
|
||||
digest TEXT NOT NULL,
|
||||
format TEXT NOT NULL CHECK (format IN ('cyclonedx', 'spdx')),
|
||||
spec_version TEXT NOT NULL,
|
||||
primary_name TEXT,
|
||||
primary_version TEXT,
|
||||
component_count INT NOT NULL DEFAULT 0,
|
||||
affected_count INT NOT NULL DEFAULT 0,
|
||||
source TEXT NOT NULL,
|
||||
tenant_id TEXT,
|
||||
registered_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
last_matched_at TIMESTAMPTZ,
|
||||
CONSTRAINT uq_sbom_registry_digest UNIQUE (tenant_id, sbom_digest)
|
||||
CONSTRAINT uq_sbom_registry_digest UNIQUE (digest)
|
||||
);
|
||||
|
||||
CREATE TABLE vuln.sbom_canonical_match (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
sbom_id UUID NOT NULL REFERENCES vuln.sbom_registry(id),
|
||||
canonical_id UUID NOT NULL REFERENCES vuln.advisory_canonical(id),
|
||||
matched_purl TEXT NOT NULL,
|
||||
purl TEXT NOT NULL,
|
||||
match_method TEXT NOT NULL,
|
||||
confidence NUMERIC(3,2) NOT NULL DEFAULT 1.0,
|
||||
is_reachable BOOLEAN NOT NULL DEFAULT false,
|
||||
is_deployed BOOLEAN NOT NULL DEFAULT false,
|
||||
matched_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT uq_sbom_canonical_match UNIQUE (sbom_id, canonical_id)
|
||||
CONSTRAINT uq_sbom_canonical_match UNIQUE (sbom_id, canonical_id, purl)
|
||||
);
|
||||
|
||||
CREATE TABLE concelier.sbom_documents (
|
||||
id UUID PRIMARY KEY DEFAULT gen_random_uuid(),
|
||||
serial_number TEXT NOT NULL,
|
||||
artifact_digest TEXT,
|
||||
format TEXT NOT NULL CHECK (format IN ('cyclonedx', 'spdx')),
|
||||
spec_version TEXT NOT NULL,
|
||||
component_count INT NOT NULL DEFAULT 0,
|
||||
service_count INT NOT NULL DEFAULT 0,
|
||||
vulnerability_count INT NOT NULL DEFAULT 0,
|
||||
has_crypto BOOLEAN NOT NULL DEFAULT false,
|
||||
has_services BOOLEAN NOT NULL DEFAULT false,
|
||||
has_vulnerabilities BOOLEAN NOT NULL DEFAULT false,
|
||||
license_ids TEXT[] NOT NULL DEFAULT '{}',
|
||||
license_expressions TEXT[] NOT NULL DEFAULT '{}',
|
||||
sbom_json JSONB NOT NULL,
|
||||
created_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
updated_at TIMESTAMPTZ NOT NULL DEFAULT NOW(),
|
||||
CONSTRAINT uq_concelier_sbom_serial UNIQUE (serial_number),
|
||||
CONSTRAINT uq_concelier_sbom_artifact UNIQUE (artifact_digest)
|
||||
);
|
||||
```
|
||||
|
||||
@@ -15,6 +15,7 @@ Provide a single, deterministic aggregation layer for cross-service UX workflows
|
||||
- Persist dashboard personalization and layout preferences.
|
||||
- Provide global search aggregation across entities.
|
||||
- Surface platform metadata for UI bootstrapping (version, build, offline status).
|
||||
- Expose analytics lake aggregates for SBOM, vulnerability, and attestation reporting.
|
||||
|
||||
## API surface (v1)
|
||||
|
||||
@@ -49,6 +50,16 @@ Provide a single, deterministic aggregation layer for cross-service UX workflows
|
||||
|
||||
### Metadata
|
||||
- GET `/api/v1/platform/metadata`
|
||||
- Response includes a capabilities list for UI bootstrapping; analytics capability is reported only when analytics storage is configured.
|
||||
|
||||
### Analytics (SBOM lake)
|
||||
- GET `/api/analytics/suppliers`
|
||||
- GET `/api/analytics/licenses`
|
||||
- GET `/api/analytics/vulnerabilities`
|
||||
- GET `/api/analytics/backlog`
|
||||
- GET `/api/analytics/attestation-coverage`
|
||||
- GET `/api/analytics/trends/vulnerabilities`
|
||||
- GET `/api/analytics/trends/components`
|
||||
|
||||
## Data model
|
||||
- `platform.dashboard_preferences` (dashboard layout, widgets, filters)
|
||||
@@ -72,11 +83,58 @@ Provide a single, deterministic aggregation layer for cross-service UX workflows
|
||||
- Preferences: `ui.preferences.read`, `ui.preferences.write`
|
||||
- Search: `search.read` plus downstream service scopes (`findings:read`, `policy:read`, etc.)
|
||||
- Metadata: `platform.metadata.read`
|
||||
- Analytics: `analytics.read`
|
||||
|
||||
## Determinism and offline posture
|
||||
- Stable ordering with explicit sort keys and deterministic tiebreakers.
|
||||
- Stable ordering with explicit sort keys and deterministic tiebreakers.
|
||||
- All timestamps in UTC ISO-8601.
|
||||
- Cache last-known snapshots for offline rendering with "data as of" markers.
|
||||
- Cache last-known snapshots for offline rendering with "data as of" markers.
|
||||
|
||||
## Analytics ingestion configuration
|
||||
|
||||
Analytics ingestion runs inside the Platform WebService and subscribes to Scanner,
|
||||
Concelier, and Attestor streams. Configure ingestion with `Platform:AnalyticsIngestion`:
|
||||
|
||||
```yaml
|
||||
Platform:
|
||||
AnalyticsIngestion:
|
||||
Enabled: true
|
||||
PostgresConnectionString: "" # optional; defaults to Platform:Storage
|
||||
AllowedTenants: ["tenant-a"]
|
||||
Streams:
|
||||
ScannerStream: "orchestrator:events"
|
||||
ConcelierObservationStream: "concelier:advisory.observation.updated:v1"
|
||||
ConcelierLinksetStream: "concelier:advisory.linkset.updated:v1"
|
||||
AttestorStream: "attestor:events"
|
||||
StartFromBeginning: false
|
||||
Cas:
|
||||
RootPath: "/var/lib/stellaops/cas"
|
||||
DefaultBucket: "attestations"
|
||||
Attestations:
|
||||
BundleUriTemplate: "bundle:{digest}"
|
||||
```
|
||||
|
||||
`BundleUriTemplate` supports `{digest}` and `{hash}` placeholders. The `bundle:` scheme
|
||||
maps to `cas://<DefaultBucket>/{digest}` by default. Verify offline bundles with
|
||||
`stella bundle verify` before ingestion.
|
||||
|
||||
## Analytics maintenance configuration
|
||||
Analytics rollups + materialized view refreshes are driven by
|
||||
`PlatformAnalyticsMaintenanceService` when analytics storage is configured.
|
||||
Use `BackfillDays` to recompute recent rollups on the first maintenance run (set to `0` to disable).
|
||||
|
||||
```yaml
|
||||
Platform:
|
||||
Storage:
|
||||
PostgresConnectionString: "Host=...;Database=...;Username=...;Password=..."
|
||||
AnalyticsMaintenance:
|
||||
Enabled: true
|
||||
RunOnStartup: true
|
||||
IntervalMinutes: 1440
|
||||
ComputeDailyRollups: true
|
||||
RefreshMaterializedViews: true
|
||||
BackfillDays: 7
|
||||
```
|
||||
|
||||
## Observability
|
||||
- Metrics: `platform.aggregate.latency_ms`, `platform.aggregate.errors_total`, `platform.aggregate.cache_hits_total`
|
||||
|
||||
@@ -17,6 +17,7 @@ The service operates strictly downstream of the **Aggregation-Only Contract (AOC
|
||||
|
||||
- Compile and evaluate `stella-dsl@1` policy packs into deterministic verdicts.
|
||||
- Join SBOM inventory, Concelier advisories, and Excititor VEX evidence via canonical linksets and equivalence tables.
|
||||
- Evaluate SBOM license expressions against policy (SPDX AND/OR/WITH/+), emitting compliance findings and attribution requirements for gate decisions.
|
||||
- Materialise effective findings (`effective_finding_{policyId}`) with append-only history and produce explain traces.
|
||||
- Emit CVSS v4.0 receipts with canonical hashing and policy replay/backfill rules; store tenant-scoped receipts with RBAC; export receipts deterministically (UTC/fonts/order) and flag v3.1→v4.0 conversions (see Sprint 0190 CVSS-GAPS-190-014 / `docs/modules/policy/cvss-v4.md`).
|
||||
- Emit per-finding OpenVEX decisions anchored to reachability evidence, forward them to Signer/Attestor for DSSE/Rekor, and publish the resulting artifacts for bench/verification consumers.
|
||||
@@ -171,9 +172,52 @@ The Determinization subsystem calculates uncertainty scores based on signal comp
|
||||
**Usage in policies:**
|
||||
|
||||
Determinization scores are exposed to SPL policies via the `signals.trust.*` and `signals.uncertainty.*` namespaces. Use `signals.uncertainty.entropy` to access entropy values and `signals.trust.score` for aggregated trust scores that combine VEX, reachability, runtime, and other signals with decay/weighting.
|
||||
|
||||
### 3.2 - License compliance configuration
|
||||
|
||||
License compliance evaluation runs during SBOM evaluation when enabled in
|
||||
`licenseCompliance` settings.
|
||||
|
||||
```json
|
||||
{
|
||||
"licenseCompliance": {
|
||||
"enabled": true,
|
||||
"policyPath": "policies/license-policy.yaml"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `sbom.license` exposes the compliance report (findings, conflicts, inventory).
|
||||
- `sbom.license_status` exposes `pass`, `warn`, or `fail` (or `unknown` when disabled).
|
||||
- Failures set the policy verdict status to `blocked` and emit `license.*` annotations.
|
||||
- Trademark notice obligations are tracked alongside attribution requirements and produce warn-level findings.
|
||||
- License compliance reports support JSON, text/markdown/html, legal-review, and PDF outputs.
|
||||
- Category breakdown includes percent totals and chart renderings (ASCII chart in text/markdown/legal-review/PDF, pie chart in HTML).
|
||||
---
|
||||
|
||||
## 4 · Data Model & Persistence
|
||||
### 3.3 - NTIA compliance configuration
|
||||
|
||||
NTIA minimum-elements validation runs when enabled under `ntiaCompliance`.
|
||||
|
||||
```json
|
||||
{
|
||||
"ntiaCompliance": {
|
||||
"enabled": true,
|
||||
"enforceGate": false,
|
||||
"policyPath": "policies/ntia-policy.yaml"
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
- `sbom.ntia` exposes NTIA compliance details (elements, findings, supplier status).
|
||||
- `sbom.ntia_status` exposes `pass`, `warn`, `fail`, or `unknown`.
|
||||
- NTIA compliance can be configured as an advisory-only check or a release gate via `enforceGate`.
|
||||
- The NTIA policy supports element selection, supplier validation (placeholder patterns, trusted/blocked lists), and framework-specific requirements.
|
||||
- Reports support JSON, text/markdown/html, and PDF output for regulatory submissions.
|
||||
|
||||
---
|
||||
|
||||
## 4 · Data Model & Persistence
|
||||
|
||||
### 4.1 Collections
|
||||
|
||||
|
||||
@@ -382,19 +382,19 @@ public class EvidenceHashDeterminismTests
|
||||
### Run All Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln
|
||||
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln
|
||||
```
|
||||
|
||||
### Run Only Unit Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --filter "Category=Unit"
|
||||
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln --filter "Category=Unit"
|
||||
```
|
||||
|
||||
### Run Only Integration Tests
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --filter "Category=Integration"
|
||||
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln --filter "Category=Integration"
|
||||
```
|
||||
|
||||
### Run Specific Test Class
|
||||
@@ -406,7 +406,7 @@ dotnet test --filter "FullyQualifiedName~PromotionValidatorTests"
|
||||
### Run with Coverage
|
||||
|
||||
```bash
|
||||
dotnet test src/StellaOps.sln --collect:"XPlat Code Coverage"
|
||||
dotnet test src/ReleaseOrchestrator/StellaOps.ReleaseOrchestrator.sln --collect:"XPlat Code Coverage"
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
@@ -14,10 +14,14 @@
|
||||
**Boundaries.**
|
||||
|
||||
* Scanner **does not** produce PASS/FAIL. The backend (Policy + Excititor + Concelier) decides presentation and verdicts.
|
||||
* Scanner **does not** keep third‑party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
|
||||
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
|
||||
|
||||
---
|
||||
* Scanner **does not** keep third‑party SBOM warehouses. It may **bind** to existing attestations for exact hashes.
|
||||
* Core analyzers are **deterministic** (no fuzzy identity). Optional heuristic plug‑ins (e.g., patch‑presence) run under explicit flags and never contaminate the core SBOM.
|
||||
|
||||
SBOM dependency reachability inference uses dependency graphs to reduce false positives and
|
||||
apply reachability-aware severity adjustments. See `src/Scanner/docs/sbom-reachability-filtering.md`
|
||||
for policy configuration and reporting expectations.
|
||||
|
||||
---
|
||||
|
||||
## 1) Solution & project layout
|
||||
|
||||
@@ -374,7 +378,40 @@ public sealed record BinaryFindingEvidence
|
||||
|
||||
The emitted `buildId` metadata is preserved in component hashes, diff payloads, and `/policy/runtime` responses so operators can pivot from SBOM entries → runtime events → `debug/.build-id/<aa>/<rest>.debug` within the Offline Kit or release bundle.
|
||||
|
||||
### 5.6 DSSE attestation (via Signer/Attestor)
|
||||
### 5.5.1 Service security analysis (Sprint 20260119_016)
|
||||
|
||||
When an SBOM path is provided, the worker runs the `service-security` stage to parse CycloneDX services and emit a deterministic report covering:
|
||||
|
||||
- Endpoint scheme hygiene (HTTP/WS/plaintext protocol detection).
|
||||
- Authentication and trust-boundary enforcement.
|
||||
- Sensitive data flow exposure and unencrypted transfers.
|
||||
- Deprecated service versions and rate-limiting metadata gaps.
|
||||
|
||||
Inputs are passed via scan metadata (`sbom.path` or `sbomPath`, plus `sbom.format`). The report is attached as a surface observation payload (`service-security.report`) and keyed in the analysis store for downstream policy and report assembly. See `src/Scanner/docs/service-security.md` for the policy schema and output formats.
|
||||
|
||||
### 5.5.2 CBOM crypto analysis (Sprint 20260119_017)
|
||||
|
||||
When an SBOM includes CycloneDX `cryptoProperties`, the worker runs the `crypto-analysis` stage to produce a crypto inventory and compliance findings for weak algorithms, short keys, deprecated protocol versions, certificate hygiene, and post-quantum readiness. The report is attached as a surface observation payload (`crypto-analysis.report`) and keyed in the analysis store for downstream evidence workflows. See `src/Scanner/docs/crypto-analysis.md` for the policy schema and inventory export formats.
|
||||
|
||||
### 5.5.3 AI/ML supply chain security (Sprint 20260119_018)
|
||||
|
||||
When an SBOM includes CycloneDX `modelCard` or SPDX AI profile data, the worker runs the `ai-ml-security` stage to evaluate model governance readiness. The report covers model card completeness, training data provenance, bias/fairness checks, safety risk assessment coverage, and provenance verification. The report is attached as a surface observation payload (`ai-ml-security.report`) and keyed in the analysis store for policy evaluation and audit trails. See `src/Scanner/docs/ai-ml-security.md` for policy schema, CLI toggles, and binary analysis conventions.
|
||||
|
||||
### 5.5.4 Build provenance verification (Sprint 20260119_019)
|
||||
|
||||
When an SBOM includes CycloneDX formulation or SPDX build profile data, the worker runs the `build-provenance` stage to verify provenance completeness, builder trust, source integrity, hermetic build requirements, and optional reproducibility checks. The report is attached as a surface observation payload (`build-provenance.report`) and keyed in the analysis store for policy enforcement and audit evidence. See `src/Scanner/docs/build-provenance.md` for policy schema, CLI toggles, and report formats.
|
||||
|
||||
### 5.5.5 SBOM dependency reachability (Sprint 20260119_022)
|
||||
|
||||
When configured, the worker runs the `reachability-analysis` stage to infer dependency reachability from SBOM graphs and optionally refine it with a `richgraph-v1` call graph. Advisory matches are filtered or severity-adjusted using `VulnerabilityReachabilityFilter`, with false-positive reduction metrics recorded for auditability. The stage attaches:
|
||||
|
||||
- `reachability.report` (JSON) for component and vulnerability reachability.
|
||||
- `reachability.report.sarif` (SARIF 2.1.0) for toolchain export.
|
||||
- `reachability.graph.dot` (GraphViz) for dependency visualization.
|
||||
|
||||
Configuration lives in `src/Scanner/docs/sbom-reachability-filtering.md`, including policy schema, metadata keys, and report outputs.
|
||||
|
||||
### 5.6 DSSE attestation (via Signer/Attestor)
|
||||
|
||||
* WebService constructs **predicate** with `image_digest`, `stellaops_version`, `license_id`, `policy_digest?` (when emitting **final reports**), timestamps.
|
||||
* Calls **Signer** (requires **OpTok + PoE**); Signer verifies **entitlement + scanner image integrity** and returns **DSSE bundle**.
|
||||
|
||||
Reference in New Issue
Block a user