Merge branch 'main' of https://git.stella-ops.org/stella-ops.org/git.stella-ops.org
Some checks failed
Docs CI / lint-and-preview (push) Has been cancelled

This commit is contained in:
2025-10-30 00:12:41 +02:00
39 changed files with 2130 additions and 164 deletions

View File

@@ -1,95 +1,379 @@
# 3 · ProductVision — **StellaOps**
*(v1.3  12Jul2025 · supersedesv1.2; expanded with ecosystem integration, refined metrics, and alignment to emerging trends)*
## 1) Problem Statement & Goals
We ship containers. We need:
- **Authenticity & integrity** of build artifacts and metadata.
- **Provenance** attached to artifacts, not platforms.
- **Transparency** to detect tampering and retroactive edits.
- **Determinism & explainability** so scanner judgments can be replayed and justified.
- **Actionability** to separate theoretical from exploitable risk (VEX).
- **Minimal trust** across multitenant and thirdparty boundaries.
**Nongoals:** Building a new package manager, inventing new SBOM/attestation formats, or depending on closed standards.
---
## 0Preamble
## 2) Golden Path (Minimal EndtoEnd Flow)
This Vision builds on the purpose and gap analysis defined in **01WHY**.
It paints a threeyear “northstar” picture of success for the opensource project and sets the measurable guardrails that every roadmap item must serve, while fostering ecosystem growth and adaptability to trends like SBOM mandates, AIassisted security **and transparent usage quotas**.
```mermaid
flowchart LR
A[Source / Image / Rootfs] --> B[SBOM Producer\nCycloneDX 1.6]
B --> C[Signer\nintoto Attestation + DSSE]
C --> D[Transparency\nSigstore Rekor - optional but RECOMMENDED]
D --> E[Durable Storage\nSBOMs, Attestations, Proofs]
E --> F[Scanner\nPkg analyzers + Entrytrace + Layer cache]
F --> G[VEX Authoring\nOpenVEX + SPDX 3.0.1 relationships]
G --> H[Policy Gate\nOPA/Rego: allow/deny + waivers]
H --> I[Artifacts Store\nReports, SARIF, VEX, Audit log]
````
**Adopted standards (pinned for interoperability):**
* **SBOM:** CycloneDX **1.6** (JSON/XML)
* **Attestation & signing:** **intoto Attestations** (Statement + Predicate) in **DSSE** envelopes
* **Transparency:** **Sigstore Rekor** (inclusion proofs, monitoring)
* **Exploitability:** **OpenVEX** (statuses & justifications)
* **Modeling & interop:** **SPDX 3.0.1** (relationships / VEX modeling)
* **Findings interchange (optional):** SARIF for analyzer output
> Pinnings are *policy*, not claims about “latest”. We may update pins via normal change control.
---
## 1NorthStar Vision Statement (2027)
## 3) Security Invariants (What MUST Always Hold)
> *By mid2027, StellaOps is the fastest, mosttrusted selfhosted SBOM scanner. Developers expect vulnerability feedback in **five seconds or less**—even while the free tier enforces a transparent **{{ quota_token }} scans/day** limit with graceful waiting. The project thrives on a vibrant plugin marketplace, weekly community releases, transparent governance, and seamless integrations with major CI/CD ecosystems—while never breaking the fivesecond promise.*
1. **Artifact identity is contentaddressed.**
* All identities are SHA256 digests of immutable blobs (images, SBOMs, attestations).
2. **Every SBOM is signed.**
* SBOMs MUST be wrapped in **intoto DSSE** attestations tied to the container digest.
3. **Provenance is attached, not implied.**
* Build metadata (who/where/how) MUST ride as attestations linked by digest.
4. **Transparency FIRST mindset.**
* Signatures/attestations SHOULD be logged to **Rekor** and store inclusion proofs.
5. **Determinism & replay.**
* Scans MUST be reproducible given: input digests, scanner version, DB snapshot, and config.
6. **Explainability.**
* Findings MUST show the *why*: package → file path → callstack / entrypoint (when available).
7. **Exploitability over enumeration.**
* Risk MUST be communicated via **VEX** (OpenVEX), including **under_investigation** where appropriate.
8. **Least privilege & minimal trust.**
* Build keys are shortlived; scanners run on ephemeral, leastprivileged workers.
9. **Airgap friendly.**
* Mirrors for vuln DBs and containers; all verification MUST work without public egress.
10. **No hidden blockers.**
* Policy gates MUST be codereviewable (e.g., Rego) and auditable; waivers are attestations, not emails.
---
## 2Outcomes & Success Metrics
## 4) Trust Boundaries & Roles
| KPI (communitycentric) | Baseline Jul2025 | Target Q22026 | NorthStar 2027 |
| -------------------------------- | ----------------- | -------------- | --------------- |
| ⭐Gitea /GitHub stars | 0 | 4000 | 10000 |
| Weekly active Docker pulls | 0 | 1500 | 4000 |
| P95 SBOM scan time (alpine) | 5s | **5s** | **4s** |
| Freetier scan satisfaction* | n/a | ≥90% | ≥95% |
| Firsttimecontributor PRs /qtr | 0 | 15 | 30 |
<!-- ```mermaid
flowchart TB
subgraph DevTenant[Dev Tenant]
SRC[Source Code]
CI[CI Runner]
end
subgraph SecPlatform[Security Platform]
SB[SBOM Service]
AT[Attestation Service]
TR[Transparency Client]
SCN[Scanner Pool]
POL[Policy Gate]
ST[Artifacts Store]
end
subgraph External[External/3rdparty]
REG[Container Registry]
REK[Rekor]
end
\*Measured via anonymous telemetry *optin only*: ratio of successful scans to `429 QuotaExceeded` errors.
SRC --> CI
CI -->|image digest| REG
REG -->|pull by digest| SB
SB --> AT --> TR --> REK
AT --> ST
REK --> ST
ST --> SCN --> POL --> ST
``` -->
* **Build/CI:** Holds signing capability (shortlived keys or keyless signing).
* **Registry:** Source of truth for image bytes; access via digest only.
* **Scanner Pool:** Ephemeral nodes; contentaddressed caches; no shared mutable state.
* **Artifacts Store:** Immutable, WORMlike storage for SBOMs, attestations, proofs, SARIF, VEX.
---
## 3Strategic Pillars
## 5) Data & Evidence We Persist
1. **SpeedFirst** preserve the sub5s P95 walltime; any feature that hurts it must ship behind a toggle or plugin. **Quota throttling must apply a soft 5s delay first, so “speed first” remains true even at the limit.**
2. **OfflinebyDesign** every byte required to scan ships in public images; Internet access is optional.
3. **ModularForever** capabilities land as hotload plugins; the monolith can split without rewrites.
4. **CommunityOwnership** ADRs and governance decisions live in public; new maintainers elected by meritocracy.
5. **ZeroSurprise Upgrades & Limits** SemVer discipline; `main` is always installable; minor upgrades never break CI YAML **and freetier limits are clearly documented, with early UI warnings.**
6. **Ecosystem Harmony** Prioritise integrations with popular OSS tools (e.g., Trivy extensions, BuildKit hooks) to lower adoption barriers.
| Artifact | MUST Persist | Why |
| -------------------- | ------------------------------------ | ---------------------------- |
| SBOM (CycloneDX 1.6) | Raw file + DSSE attestation | Reproducibility, audit |
| intoto Statement | Full JSON | Traceability |
| Rekor entry | UUID + inclusion proof | Tamperevidence |
| Scanner output | SARIF + raw notes | Triage & tooling interop |
| VEX | OpenVEX + links to findings | Noise reduction & compliance |
| Policy decisions | Input set + decision + rule versions | Governance & forensics |
Retention follows our Compliance policy; default **≥ 18 months**.
---
## 4Roadmap Themes (1824months)
## 6) Scanner Requirements (Determinism & Explainability)
| Horizon | Theme | Example EPIC |
| ------------------ | ----------------------- | ---------------------------------------------------------------------------------------------------------------------------------- |
| **Q32025** (3mo) | **Core Stability & UX** | Onecommand installer; darkmode UI; baseline SBOM scanning; **Freetier Quota Service ({{ quota_token }} scans/day, early banner, waitwall).** |
| 612mo | *Extensibility* | Scanservice microsplit PoC; community plugin marketplace beta. |
| 1218mo | *Ecosystem* | Community plugin marketplace launch; integrations with Syft and Harbor. |
| 1824mo | *Resilience & Scale* | Redis Cluster autosharding; AIassisted triage plugin framework. |
*(Granular decomposition lives in 25_LEDGER.md.)
* **Inputs pinned:** image digest(s), SBOM(s), scanner version, vuln DB snapshot date, config hash.
* **Explainability:** show file paths, package coords (e.g., purl), and—when possible—**entrytrace/callstack** from executable entrypoints to vulnerable symbol(s).
* **Caching:** contentaddressed perlayer & perecosystem caches; warming does not change decisions.
* **Unknowns:** output **under_investigation** where exploitability is not yet known; roll into VEX.
* **Interchange:** emit **SARIF** for IDE and pipeline consumption (optional but recommended).
---
## 5Stakeholder Personas & Benefits
## 7) Policy Gate (OPA/Rego) — Examples
| Persona | Core Benefit |
| --------------------- | ---------------------------------------------------------------- |
| Solo OSS maintainer | Laptop scans in **5s**; zero cloud reliance. |
| CI Platform Engineer | Singlebinary backend + Redis; stable YAML integrations. |
| Security Auditor | AGPL code, traceable CVE sources, reproducible benchmarks. |
| Community Contributor | Plugin hooks and goodfirst issues; meritbased maintainer path. |
| Budgetconscious Lead | Clear **{{ quota_token }} scans/day** allowance before upgrades are required. |
> Gate runs after scan + VEX merge. It treats VEX as firstclass input.
(See **01WHY §3** for detailed painpoints & evidence.)
### 7.1 Deny unreconciled criticals that are exploitable
```rego
package stella.policy
default allow := false
exploitable(v) {
v.severity == "CRITICAL"
v.exploitability == "affected"
}
allow {
not exploitable_some
}
exploitable_some {
some v in input.findings
exploitable(v)
not waived(v.id)
}
waived(id) {
some w in input.vex
w.vuln_id == id
w.status == "not_affected"
w.justification != ""
}
```
### 7.2 Require Rekor inclusion for attestations
```rego
package stella.policy
violation[msg] {
some a in input.attestations
not a.rekor.inclusion_proof
msg := sprintf("Attestation %s lacks Rekor inclusion proof", [a.id])
}
```
---
## 6NonGoals (20252027)
## 8) Version Pins & Compatibility
* Multitenant SaaS offering.
* Automated “fix PR” generation.
* Proprietary compliance certifications (left to downstream distros).
* Windows **container** scanning (agents only).
| Domain | Standard | Stella Pin | Notes |
| ------------ | -------------- | ---------------- | ------------------------------------------------ |
| SBOM | CycloneDX | **1.6** | JSON or XML accepted; JSON preferred |
| Attestation | intoto | **Statement v1** | Predicates per use case (e.g., sbom, provenance) |
| Envelope | DSSE | **v1** | Canonical JSON payloads |
| Transparency | Sigstore Rekor | **API stable** | Inclusion proof stored alongside artifacts |
| VEX | OpenVEX | **spec current** | Map to SPDX 3.0.1 relationships as needed |
| Interop | SPDX | **3.0.1** | Use for modeling & crossecosystem exchange |
| Findings | SARIF | **2.1.0** | Optional but recommended |
---
## 7Review & Change Process
## 9) Minimal CLI Playbook (Illustrative)
* **Cadence:** product owner leads a public Vision review every **2 sprints (≈1quarter)**.
* **Amendments:** material changes require PR labelled `type:vision` + two maintainer approvals.
* **Versioning:** bump patch for typo, minor for KPI tweak, major if NorthStar statement shifts.
* **Community Feedback:** Open GitHub Discussions for input; incorporate topvoted suggestions quarterly.
> Commands below are illustrative; wire them into CI with shortlived credentials.
```bash
# 1) Produce SBOM (CycloneDX 1.6) from image digest
syft registry:5000/myimg@sha256:... -o cyclonedx-json > sbom.cdx.json
# 2) Create intoto DSSE attestation bound to the image digest
cosign attest --predicate sbom.cdx.json \
--type https://stella-ops.org/attestations/sbom/1 \
--key env://COSIGN_KEY \
registry:5000/myimg@sha256:...
# 3) (Optional but recommended) Rekor transparency
cosign sign --key env://COSIGN_KEY registry:5000/myimg@sha256:...
cosign verify-attestation --type ... --certificate-oidc-issuer https://token.actions... registry:5000/myimg@sha256:... > rekor-proof.json
# 4) Scan (pinned DB snapshot)
stella-scan --image registry:5000/myimg@sha256:... \
--sbom sbom.cdx.json \
--db-snapshot 2025-10-01 \
--out findings.sarif
# 5) Emit VEX
stella-vex --from findings.sarif --policy vex-policy.yaml --out vex.json
# 6) Gate
opa eval -i gate-input.json -d policy/ -f pretty "data.stella.policy.allow"
```
---
## 10) JSON Skeletons (CopyReady)
### 10.1 intoto Statement (DSSE payload)
```json
{
"_type": "https://in-toto.io/Statement/v1",
"subject": [
{
"name": "registry:5000/myimg",
"digest": { "sha256": "IMAGE_DIGEST_SHA256" }
}
],
"predicateType": "https://stella-ops.org/attestations/sbom/1",
"predicate": {
"sbomFormat": "CycloneDX",
"sbomVersion": "1.6",
"mediaType": "application/vnd.cyclonedx+json",
"location": "sha256:SBOM_BLOB_SHA256"
}
}
```
### 10.2 DSSE Envelope (wrapping the Statement)
```json
{
"payloadType": "application/vnd.in-toto+json",
"payload": "BASE64URL_OF_CANONICAL_STATEMENT_JSON",
"signatures": [
{
"keyid": "KEY_ID_OR_CERT_ID",
"sig": "BASE64URL_SIGNATURE"
}
]
}
```
### 10.3 OpenVEX (compact)
```json
{
"@context": "https://openvex.dev/ns/v0.2.0",
"author": "Stella Ops Security",
"timestamp": "2025-10-29T00:00:00Z",
"statements": [
{
"vulnerability": "CVE-2025-0001",
"products": ["pkg:purl/example@1.2.3?arch=amd64"],
"status": "under_investigation",
"justification": "analysis_ongoing",
"timestamp": "2025-10-29T00:00:00Z"
}
]
}
```
---
## 11) Handling “Unknowns” & Noise
* Use **OpenVEX** statuses: `affected`, `not_affected`, `fixed`, `under_investigation`.
* Prefer **justifications** over freetext.
* Timebound **waivers** are modeled as VEX with `not_affected` + justification or `affected` + compensating controls.
* Dashboards MUST surface counts separately for `under_investigation` so risk is visible.
---
## 12) Operational Guidance
**Key management**
* Use **ephemeral OIDC** or shortlived keys (HSM/KMS bound).
* Rotate signer identities at least quarterly; no shared longterm keys in CI.
**Caching & performance**
* Layer caches keyed by digest + analyzer version.
* Prewarm vuln DB snapshots; mirror into airgapped envs.
**Multitenancy**
* Strict tenant isolation for storage and compute.
* Ratelimit and bound memory/CPU per scan job.
**Auditing**
* Every decision is a record: inputs, versions, rule commit, actor, result.
* Preserve Rekor inclusion proofs with the attestation record.
---
## 13) Exceptions Process (Breakglass)
1. Open a tracked exception with: artifact digest, CVE(s), business justification, expiry.
2. Generate VEX entry reflecting the exception (`not_affected` with justification or `affected` with compensating controls).
3. Merge into policy inputs; **policy MUST read VEX**, not tickets.
4. Rereview before expiry; exceptions cannot autorenew.
---
## 14) Threat Model (Abbreviated)
* **Tampering**: modified SBOMs/attestations → mitigated by DSSE + Rekor + WORM storage.
* **Confused deputy**: scanning a different image → mitigated by digestonly pulls and subject digests in attestations.
* **TOCTOU / retagging**: registry tags drift → mitigated by digest pinning everywhere.
* **Scanner poisoning**: unpinned DBs → mitigated by snapshotting and recording version/date.
* **Key compromise**: longlived CI keys → mitigated by OIDC keyless or shortlived KMS keys.
---
## 15) Implementation Checklist
* [ ] SBOM producer emits CycloneDX 1.6; bound to image digest.
* [ ] intoto+DSSE signing wired in CI; Rekor logging enabled.
* [ ] Durable artifact store with WORM semantics.
* [ ] Scanner produces explainable findings; SARIF optional.
* [ ] OpenVEX emitted and archived; linked to findings & image.
* [ ] Policy gate enforced; waivers modeled as VEX; decisions logged.
* [ ] Airgap mirrors for registry and vuln DBs.
* [ ] Runbooks for key rotation, Rekor outage, and database rollback.
---
## 16) Glossary
* **SBOM**: Software Bill of Materials describing packages/components within an artifact.
* **Attestation**: Signed statement binding facts (predicate) to a subject (artifact) using intoto.
* **DSSE**: Envelope that signs the canonical payload detached from transport.
* **Transparency Log**: Appendonly log (e.g., Rekor) giving inclusion and temporal proofs.
* **VEX**: Vulnerability Exploitability eXchange expressing exploitability status & justification.
---
## 8·Change Log
| Version | Date | Note (highlevel) |
| ------- | ----------- | ----------------------------------------------------------------------------------------------------- |
| v1.4 | 29-Oct-2025 | Initial principles, golden path, policy examples, and JSON skeletons. |
| v1.4 | 14Jul2025 | First public revision reflecting quarterly roadmap & KPI baseline. |
| v1.3 | 12Jul2025 | Expanded ecosystem pillar, added metrics/integrations, refined non-goals, community persona/feedback. |
| v1.2 | 11Jul2025 | Restructured to link with WHY; merged principles into StrategicPillars; added review §7 |

View File

@@ -0,0 +1,49 @@
# Cartographer Graph Handshake Plan
_Status: 2025-10-29_
## Why this exists
The Concelier/Excititor graph enrichment work (CONCELIER-GRAPH-21-001/002, EXCITITOR-GRAPH-21-001/002/005) and the merge-side coordination tasks (FEEDMERGE-COORD-02-901/902) are blocked on a clear contract with Cartographer and the Policy Engine. This document captures the minimum artefacts each guild owes so we can unblock the graph pipeline and resume implementation without re-scoping every stand-up.
## Deliverables by guild
### Cartographer Guild
- **CARTO-GRAPH-21-002** (Inspector contract): publish the inspector payload schema (`graph.inspect.v1`) including the fields Cartographer needs from Concelier/Excititor (SBOM relationships, advisory/VEX linkouts, justification summaries). Target format: shared Proto/JSON schema stored under `src/Cartographer/Contracts/`.
- **CARTO-GRAPH-21-005** (Inspector access patterns): document the query shapes Cartographer will execute (PURL → advisory, PURL → VEX statement, policy scope filters) so storage can project the right indexes/materialized views. Include sample `mongosh` queries and desired TTL/limit behaviour.
- Provide a test harness (e.g., Postman collection or integration fixture) Cartographer will use to validate the Concelier/Excititor endpoints once they land.
### Concelier Core Guild
- Derive adjacency data from SBOM normalization as described in CONCELIER-GRAPH-21-001 (depends on `CONCELIER-POLICY-20-002`). Once Cartographer publishes the schema above, implement:
- Node payloads: component metadata, scopes, entrypoint annotations.
- Edge payloads: `contains`, `depends_on`, `provides`, provenance array.
- **Change events (CONCELIER-GRAPH-21-002)**: define `sbom.relationship.changed` event contract with tenant + context metadata, referencing Cartographers filter requirements. Include event samples and replay instructions in `docs/graph/concelier-events.md`.
- Coordinate with Cartographer on pagination/streaming expectations (page size, continuation token, retention window).
### Excititor Core & Storage Guilds
- **Inspector linkouts (EXCITITOR-GRAPH-21-001)**: expose Batched VEX/advisory lookup endpoint that accepts graph node PURLs and responds with raw document slices + justification metadata. Ensure Policy Engine scope enrichment (EXCITITOR-POLICY-20-002) feeds this response so Cartographer does not need to call multiple services.
- **Overlay enrichment (EXCITITOR-GRAPH-21-002)**: align the overlay metadata with Cartographers schema once it lands (include justification summaries, document versions, and provenance).
- **Indexes/materialized views (EXCITITOR-GRAPH-21-005)**: after Cartographer publishes query shapes, create the necessary indexes (PURL + tenant, policy scope) and document migrations in storage runbooks. Provide load testing evidence before enabling in production.
### Policy Guild
- **CONCELIER-POLICY-20-002**: publish the enriched linkset schema that powers both Concelier and Excititor payloads. Include enumerations for relationship types and scope tags.
- Share the Policy Engine timeline for policy overlay metadata (`POLICY-ENGINE-30-001`) so Excititor can plan the overlay enrichment delivery.
## Shared action items
| Owner | Task | Deadline | Notes |
|-------|------|----------|-------|
| Cartographer | Publish inspector schema + query patterns (`CARTO-GRAPH-21-002`/`21-005`) | 2025-11-04 | Attach schema files + examples to this doc once merged. |
| Concelier Core | Draft change-event payload with sample JSON | 2025-11-06 | Blocked until Cartographer schema lands; prepare skeleton PR in `docs/graph/concelier-events.md`. |
| Excititor Core/Storage | Prototype batch linkout API + index design doc | 2025-11-07 | Leverage Cartographer query patterns to size indexes; include perf targets. |
| Policy Guild | Confirm linkset enrichment fields + overlay timeline | 2025-11-05 | Needed to unblock both Concelier enrichment and Excititor overlay tasks. |
## Reporting
- Track progress in the `#cartographer-handshake` Slack thread (create once Cartographer posts the schema MR).
- During the twice-weekly graph sync, review outstanding checklist items above and update the task notes (`TASKS.md`) so the backlog reflects real-time status.
- Once the schema and query contracts are merged, the Concelier/Excititor teams can flip their tasks from **BLOCKED** to **DOING** and attach implementation plans referencing this document.
## Appendix: references
- `CONCELIER-GRAPH-21-001`, `CONCELIER-GRAPH-21-002` (Concelier Core task board)
- `EXCITITOR-GRAPH-21-001`, `EXCITITOR-GRAPH-21-002`, `EXCITITOR-GRAPH-21-005` (Excititor Core/Storage task boards)
- `CARTO-GRAPH-21-002`, `CARTO-GRAPH-21-005` (Cartographer task board)
- `POLICY-ENGINE-30-001`, `CONCELIER-POLICY-20-002`, `EXCITITOR-POLICY-20-002` (Policy Engine roadmap)

View File

@@ -0,0 +1,37 @@
# Java Analyzer Observation Writer Plan
_Status: 2025-10-29_
SCANNER-ANALYZERS-JAVA-21-008 (resolver + AOC writer) is blocked by upstream heuristics that need to settle before we can emit observation JSON. This note itemises the remaining work so the analyzer guild can sequence delivery without re-opening design discussions in every stand-up.
## Prerequisite summary
- **SCANNER-ANALYZERS-JAVA-21-004** (reflection / dynamic loader heuristics) must emit normalized reflection edges with confidence + call-site metadata. Outstanding items: TCCL coverage for servlet containers and resource-based plugin hints. Owners: Java Analyzer Guild.
- **SCANNER-ANALYZERS-JAVA-21-005** (framework config extraction) required to surface Spring/Jakarta entrypoints that feed observation entrypoint metadata. Add YAML/property parsing fixtures and document reason codes (`config-spring`, `config-jaxrs`, etc.).
- **SCANNER-ANALYZERS-JAVA-21-006** (JNI/native hints) optional but highly recommended before observation writer so JNI edges land alongside static ones. Coordinate with native analyzer on reason codes.
- **Advisory core** ensure AOC writer schema (`JavaObservation.json`) is frozen before we serialise to avoid churn downstream.
## Deliverables for SCANNER-ANALYZERS-JAVA-21-008
1. **Observation projection (`JavaObservationWriter`)**
- Inputs: normalised workspace + analyzer outputs (classpath graph, SPI table, reflection edges, config hints, JNI hints).
- Outputs: deterministic JSON containing entrypoints, components, edges, warnings, provenance. Align with `docs/aoc/java-observation-schema.md` once published.
2. **AOC guard integration**
- Serialize observation documents through `Scanner.Aoc` guard pipeline; add unit tests covering required fields and forbidden derived data.
3. **Fixture updates**
- Expand `fixtures/lang/java/` set to include reflection-heavy app, Spring Boot sample, JNI sample, modular app. Record golden outputs with `UPDATE_JAVA_FIXTURES=1`.
4. **Metrics & logging**
- Emit counters (`scanner.java.observation.edges_total`, etc.) to trace observation completeness during CI runs.
5. **Documentation**
- Update `docs/scanner/java-analyzer.md` with reason code matrix and observation field definitions.
## Action items
| Owner | Task | Due | Notes |
|-------|------|-----|-------|
| Java Analyzer Guild | Land reflection TODOs (TCCL + resource plugin hints) | 2025-11-01 | Required for reliable dynamic edges. |
| Java Analyzer Guild | Finish config extractor for Spring/Jakarta | 2025-11-02 | Use sample apps in `fixtures/lang/java/config-*`. |
| Java Analyzer Guild | Draft observation writer spike PR using new schema | 2025-11-04 | PR can be draft but should include JSON schema + sample. |
| Scanner AOC Owners | Validate observation JSON against AOC guard + schema | 2025-11-05 | Blocker for marking 21-008 as DOING. |
| QA Guild | Prepare regression harness + performance gate (<300ms per fat jar) | 2025-11-06 | Align with SCANNER-ANALYZERS-JAVA-21-009. |
## Reporting
- Track these checkpoints in the Java analyzer weekly sync; once prerequisites are green, flip SCANNER-ANALYZERS-JAVA-21-008 to **DOING**.
- Store schema and sample output under `docs/scanner/java-observations/` so AOC reviewers have a stable reference.

View File

@@ -0,0 +1,94 @@
# Normalized Version Rule Recipes
_Status: 2025-10-29_
This guide captures the minimum wiring required for connectors and Merge coordination tasks to finish the normalized-version rollout that unblocks FEEDMERGE-COORD-02-9xx.
## 1. Quick-start checklist
1. Ensure your mapper already emits `AffectedPackage.VersionRanges` (SemVer, NEVRA, EVR). If you only have vendor/product strings, capture the raw range text before trimming so it can feed the helper.
2. Call `SemVerRangeRuleBuilder.BuildNormalizedRules(rawRange, patchedVersion, provenance)` for each range and place the result in `AffectedPackage.NormalizedVersions`.
3. Set a provenance note in the format `connector:{advisoryId}:{index}` so Merge can differentiate connector-provided rules from canonical fallbacks.
4. Verify with `dotnet test` that the connector snapshot fixtures now include the `normalizedVersions` array and update fixtures by setting the connector-specific `UPDATE_*_FIXTURES=1` environment variable.
5. Tail Merge logs (or the test output) for the new warning `Normalized version rules missing for {AdvisoryKey}`; an empty warning stream means the connector/merge artefacts are ready to close FEEDMERGE-COORD-02-901/902.
## 2. Code snippet: SemVer connector (CCCS/Cisco/ICS-CISA)
```csharp
using StellaOps.Concelier.Normalization.SemVer;
private static IReadOnlyList<AffectedPackage> BuildPackages(MyDto dto, DateTimeOffset recordedAt)
{
var packages = new List<AffectedPackage>();
foreach (var entry in dto.AffectedEntries.Select((value, index) => (value, index)))
{
var rangeText = entry.value.Range?.Trim();
var patched = entry.value.FixedVersion;
var provenance = $"{MyConnectorPlugin.SourceName}:{dto.AdvisoryId}:{entry.index}";
var normalizedRules = SemVerRangeRuleBuilder.BuildNormalizedRules(rangeText, patched, provenance);
var primitives = SemVerRangeRuleBuilder.Build(rangeText, patched, provenance)
.Select(result => result.Primitive.ToAffectedVersionRange(provenance))
.ToArray();
packages.Add(new AffectedPackage(
AffectedPackageTypes.SemVer,
entry.value.PackageId,
versionRanges: primitives,
normalizedVersions: normalizedRules,
provenance: new[]
{
new AdvisoryProvenance(
MyConnectorPlugin.SourceName,
"package",
entry.value.PackageId,
recordedAt,
new[] { ProvenanceFieldMasks.AffectedPackages })
}));
}
return packages;
}
```
A few notes:
- If you already have `SemVerPrimitive` instances, call `.ToNormalizedVersionRule(provenance)` on each primitive instead of rebuilding from raw strings.
- Use `SemVerRangeRuleBuilder.BuildNormalizedRules` when the connector only tracks raw range text plus an optional fixed/patched version.
- For products that encode ranges like `"ExampleOS 4.12 - 4.14"`, run a small regex to peel off the version substring (see §3) and use the same provenance note when emitting the rule and the original range primitive.
## 3. Parsing helper for trailing version phrases
Many of the overdue connectors store affected products as natural-language phrases. The following helper normalises common patterns (`1.2 - 1.4`, `<= 3.5`, `Version 7.2 and later`).
```csharp
private static string? TryExtractRangeSuffix(string productString)
{
if (string.IsNullOrWhiteSpace(productString))
{
return null;
}
var match = Regex.Match(productString, "(?<range>(?:<=?|>=?)?\s*\d+(?:\.\d+){0,2}(?:\s*-\s*\d+(?:\.\d+){0,2})?)", RegexOptions.CultureInvariant);
return match.Success ? match.Groups["range"].Value.Trim() : null;
}
```
Once you extract the `range` fragment, feed it to `SemVerRangeRuleBuilder.BuildNormalizedRules(range, null, provenance)`. Keep the original product string as-is so operators can still see the descriptive text.
## 4. Merge dashboard hygiene
- Run `dotnet build src/Concelier/__Libraries/StellaOps.Concelier.Merge/StellaOps.Concelier.Merge.csproj` after wiring a connector to confirm no warnings appear.
- Merge counter tag pairs to watch in Grafana/CI logs:
- `concelier.merge.normalized_rules{package_type="npm"}` increases once the connector emits normalized arrays.
- `concelier.merge.normalized_rules_missing{package_type="vendor"}` should trend to zero once rollout completes.
- The Merge service now logs `Normalized version rules missing for {AdvisoryKey}; sources=...; packageTypes=...` when a connector still needs to supply normalized rules. Use this as the acceptance gate for FEEDMERGE-COORD-02-901/902.
## 5. Documentation touchpoints
- Update the connector `TASKS.md` entry with the date you flipped on normalized rules and note the provenance format you chose.
- Record any locale-specific parsing (e.g., German `bis`) in the connector README so future contributors can regenerate fixtures confidently.
- When opening the PR, include `dotnet test` output covering the connector tests so reviewers see the normalized array diff.
Once each connector follows the steps above, we can mark FEEDCONN-CCCS-02-009, FEEDCONN-CISCO-02-009, FEEDCONN-CERTBUND-02-010, FEEDCONN-ICSCISA-02-012, and the FEEDMERGE-COORD-02-90x tasks as resolved.

View File

@@ -50,7 +50,7 @@
| `content.format` | string | Source format (`CSAF`, `OSV`, etc.). |
| `content.spec_version` | string | Upstream spec version when known. |
| `content.raw` | object | Full upstream payload, untouched except for transport normalisation. |
| `identifiers` | object | Normalised identifiers (`cve`, `ghsa`, `aliases`, etc.) derived losslessly from raw content. |
| `identifiers` | object | Upstream identifiers (`cve`, `ghsa`, `aliases`, etc.) captured as provided (trimmed, order preserved, duplicates allowed). |
| `linkset` | object | Join hints (see section 4.3). |
| `supersedes` | string or null | Points to previous revision of same upstream doc when content hash changes. |
@@ -79,6 +79,7 @@
Canonicalisation rules:
- Package URLs are rendered in canonical form without qualifiers/subpaths (`pkg:type/namespace/name@version`).
- CPE values are normalised to the 2.3 binding (`cpe:2.3:part:vendor:product:version:*:*:*:*:*:*:*`).
- Connector mapping stages are responsible for the canonical form; ingestion trims whitespace but otherwise preserves the original order and duplicate entries so downstream policy can reason about upstream intent.
### 4.4 `advisory_observations`
@@ -99,10 +100,10 @@ Canonicalisation rules:
| `content.format` / `content.specVersion` | string | Raw payload format metadata (CSAF, OSV, JSON, etc.). |
| `content.raw` | object | Full upstream document stored losslessly (Relaxed Extended JSON). |
| `content.metadata` | object | Optional connector-specific metadata (batch ids, hints). |
| `linkset.aliases` | array | Normalized aliases (lower-case, sorted). |
| `linkset.purls` | array | Normalized PURLs extracted from the document. |
| `linkset.cpes` | array | Normalized CPE URIs. |
| `linkset.references` | array | `{ type, url }` pairs (type lower-case). |
| `linkset.aliases` | array | Connector-supplied aliases (trimmed, order preserved, duplicates allowed). |
| `linkset.purls` | array | Connector-supplied PURLs (ingestion preserves order and duplicates). |
| `linkset.cpes` | array | Connector-supplied CPE URIs (trimmed, order preserved). |
| `linkset.references` | array | `{ type, url }` pairs (trimmed; ingestion preserves order). |
| `createdAt` | datetime | Timestamp when Concelier persisted the observation. |
| `attributes` | object | Optional provenance attributes keyed by connector. |