Files
git.stella-ops.org/docs/11_DATA_SCHEMAS.md

197 lines
7.2 KiB
Markdown
Raw Permalink Blame History

This file contains invisible Unicode characters

This file contains invisible Unicode characters that are indistinguishable to humans but may be processed differently by a computer. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

#Data Schemas & Persistence Contracts
*Audience* backend developers, plugin authors, DB admins.
*Scope* describes **Redis**, **MongoDB** (optional), and ondisk blob shapes that power StellaOps.
---
##0Document Conventions
* **CamelCase** for JSON.
* All timestamps are **RFC 3339 / ISO 8601** with `Z` (UTC).
* `⭑` = planned but *not* shipped yet (kept on Feature Matrix “To Do”).
---
##1SBOMWrapper Envelope
Every SBOM blob (regardless of format) is stored on disk or in object storage with a *sidecar* JSON file that indexes it for the scanners.
#### 1.1 JSON Shape
```jsonc
{
"id": "sha256:417f…", // digest of the SBOM *file* itself
"imageDigest": "sha256:e2b9…", // digest of the original container image
"created": "2025-07-14T07:02:13Z",
"format": "trivy-json-v2", // NEW enum: trivy-json-v2 | spdx-json | cyclonedx-json
"layers": [
"sha256:d38b…", // layer digests (ordered)
"sha256:af45…"
],
"partial": false, // true => delta SBOM (only some layers)
"provenanceId": "prov_0291" // ⭑ link to SLSA attestation (Q12026)
}
```
*`format`* **NEW** added to support **multiple SBOM formats**.
*`partial`* **NEW** true when generated via the **delta SBOM** flow (§1.3).
#### 1.2 Filesystem Layout
```
blobs/
├─ 417f… # digest prefix
│   ├─ sbom.json # payload (any format)
│   └─ sbom.meta.json # wrapper (shape above)
```
> **Note** blob storage can point at S3, MinIO, or plain disk; driver plugins adapt.
####1.3Delta SBOM Extension
When `partial: true`, *only* the missing layers have been scanned.
Merging logic inside `scanning` module stitches new data onto the cached full SBOM in Redis.
---
##2Redis Keyspace
| Key pattern | Type | TTL | Purpose |
|-------------------------------------|---------|------|--------------------------------------------------|
| `scan:<digest>` | string | ∞ | Last scan JSON result (as returned by `/scan`) |
| `layers:<digest>` | set | 90d | Layers already possessing SBOMs (delta cache) |
| `policy:active` | string | ∞ | YAML **or** Rego ruleset |
| `quota:<token>` | string | *until next UTC midnight* | Pertoken scan counter for Free tier (333 scans). |
| `policy:history` | list | ∞ | Change audit IDs (see Mongo) |
| `feed:nvd:json` | string | 24h | Normalised feed snapshot |
| `locator:<imageDigest>` | string | 30d | Maps image digest → sbomBlobId |
| `metrics:…` | various | — | Prom / OTLP runtime metrics |
> **Delta SBOM** uses `layers:*` to skip work in <20ms.
> **Quota enforcement** increments `quota:<token>` atomically; when value333 the API returns **429**.
---
##3MongoDB Collections (Optional)
Only enabled when `MONGO_URI` is supplied (for longterm audit).
| Collection | Shape (summary) | Indexes |
|--------------------|------------------------------------------------------------|-------------------------------------|
| `sbom_history` | Wrapper JSON + `replaceTs` on overwrite | `{imageDigest}` `{created}` |
| `policy_versions` | `{_id, yaml, rego, authorId, created}` | `{created}` |
| `attestations` ⭑ | SLSA provenance doc + Rekor log pointer | `{imageDigest}` |
| `audit_log` | Fully rendered RFC 5424 entries (UI & CLI actions) | `{userId}` `{ts}` |
Schema detail for **policy_versions**:
```jsonc
{
"_id": "6619e90b8c5e1f76",
"yaml": "version: 1.0\nrules:\n - …",
"rego": null, // filled when Rego uploaded
"authorId": "u_1021",
"created": "2025-07-14T08:15:04Z",
"comment": "Imported via API"
}
```
---
##4Policy Schema (YAML v1.0)
Minimal viable grammar (subset of OSVSCHEMA ideas).
```yaml
version: "1.0"
rules:
- name: Block Critical
severity: [Critical]
action: block
- name: Ignore Low Dev
severity: [Low, None]
environments: [dev, staging]
action: ignore
expires: "2026-01-01"
- name: Escalate BDU High
sources: [BDU]
severity: [High, Critical]
action: escalate
```
Validation is performed by `policy:mapping.yaml` JSONSchema embedded in backend.
###4.1Rego Variant (Advanced  TODO)
*Accepted but stored asis in `rego` field.*
Evaluated via internal **OPA** sidecar once feature graduates from TODO list.
---
##5SLSA Attestation Schema 
Planned for Q12026 (kept here for early plugin authors).
```jsonc
{
"id": "prov_0291",
"imageDigest": "sha256:e2b9…",
"buildType": "https://slsa.dev/container/v1",
"builder": {
"id": "https://git.stella-ops.ru/ci/stella-runner@sha256:f7b7…"
},
"metadata": {
"invocation": {
"parameters": {"GIT_SHA": "f6a1…"},
"buildStart": "2025-07-14T06:59:17Z",
"buildEnd": "2025-07-14T07:01:22Z"
},
"completeness": {"parameters": true}
},
"materials": [
{"uri": "git+https://git…", "digest": {"sha1": "f6a1…"}}
],
"rekorLogIndex": 99817 // entry in local Rekor mirror
}
```
---
##6Validator Contracts
* For SBOM wrapper `ISbomValidator` (DLL plugin) must return *typed* error list.
* For YAML policies JSONSchema at `/schemas/policyv1.json`.
* For Rego OPA `opa eval --fail-defined` under the hood.
* For **Freetier quotas** `IQuotaService` integration tests ensure `quota:<token>` resets at UTC midnight and produces correct `RetryAfter` headers.
---
##7Migration Notes
1. **Add `format` column** to existing SBOM wrappers; default to `trivy-json-v2`.
2. **Populate `layers` & `partial`** via backfill script (ship with `stellopsctl migrate` wizard).
3. Policy YAML previously stored in Redis → copy to Mongo if persistence enabled.
4. Prepare `attestations` collection (empty) safe to create in advance.
---
##8Open Questions / Future Work
* How to deduplicate *identical* Rego policies differing only in whitespace?
* Embed *GOST 34.112018* digests when users enable Russian crypto suite?
* Should enterprise tiers share the same Redis quota keys or switch to JWT claim`tier != Free` bypass?
* Evaluate slidingwindow quota instead of strict daily reset.
* Consider ratelimit for `/layers/missing` to avoid bruteforce enumeration.
---
##9Change Log
| Date | Note |
|------------|--------------------------------------------------------------------------------|
| 20250714 | **Added:** `format`, `partial`, delta cache keys, YAML policy schema v1.0. |
| 20250712 | **Initial public draft** SBOM wrapper, Redis keyspace, audit collections. |
---