feat: Implement BerkeleyDB reader for RPM databases
Some checks failed
AOC Guard CI / aoc-guard (push) Has been cancelled
AOC Guard CI / aoc-verify (push) Has been cancelled
Concelier Attestation Tests / attestation-tests (push) Has been cancelled
Docs CI / lint-and-preview (push) Has been cancelled
Policy Lint & Smoke / policy-lint (push) Has been cancelled
Scanner Analyzers / Discover Analyzers (push) Has been cancelled
Scanner Analyzers / Build Analyzers (push) Has been cancelled
Scanner Analyzers / Test Language Analyzers (push) Has been cancelled
Scanner Analyzers / Validate Test Fixtures (push) Has been cancelled
Scanner Analyzers / Verify Deterministic Output (push) Has been cancelled
console-runner-image / build-runner-image (push) Has been cancelled
wine-csp-build / Build Wine CSP Image (push) Has been cancelled
wine-csp-build / Integration Tests (push) Has been cancelled
wine-csp-build / Security Scan (push) Has been cancelled
wine-csp-build / Generate SBOM (push) Has been cancelled
wine-csp-build / Publish Image (push) Has been cancelled
wine-csp-build / Air-Gap Bundle (push) Has been cancelled
wine-csp-build / Test Summary (push) Has been cancelled

- Added BerkeleyDbReader class to read and extract RPM header blobs from BerkeleyDB hash databases.
- Implemented methods to detect BerkeleyDB format and extract values, including handling of page sizes and magic numbers.
- Added tests for BerkeleyDbReader to ensure correct functionality and header extraction.

feat: Add Yarn PnP data tests

- Created YarnPnpDataTests to validate package resolution and data loading from Yarn PnP cache.
- Implemented tests for resolved keys, package presence, and loading from cache structure.

test: Add egg-info package fixtures for Python tests

- Created egg-info package fixtures for testing Python analyzers.
- Included PKG-INFO, entry_points.txt, and installed-files.txt for comprehensive coverage.

test: Enhance RPM database reader tests

- Added tests for RpmDatabaseReader to validate fallback to legacy packages when SQLite is missing.
- Implemented helper methods to create legacy package files and RPM headers for testing.

test: Implement dual signing tests

- Added DualSignTests to validate secondary signature addition when configured.
- Created stub implementations for crypto providers and key resolvers to facilitate testing.

chore: Update CI script for Playwright Chromium installation

- Modified ci-console-exports.sh to ensure deterministic Chromium binary installation for console exports tests.
- Added checks for Windows compatibility and environment variable setups for Playwright browsers.
This commit is contained in:
StellaOps Bot
2025-12-07 16:24:45 +02:00
parent e3f28a21ab
commit 11597679ed
199 changed files with 9809 additions and 4404 deletions

View File

@@ -1,76 +1,82 @@
# Concelier Backfill & Rollback Plan (STORE-AOC-19-005-DEV)
# Concelier Backfill & Rollback Plan (STORE-AOC-19-005-DEV, Postgres)
## Objective
Prepare and rehearse the raw-linkset backfill/rollback so Concelier Mongo reflects Link-Not-Merge data deterministically across dev/stage. This runbook unblocks STORE-AOC-19-005-DEV.
Prepare and rehearse the raw Link-Not-Merge backfill/rollback so Concelier Postgres reflects the dataset deterministically across dev/stage. This replaces the prior Mongo workflow.
## Inputs
- Source dataset: staging export tarball `linksets-stage-backfill.tar.zst`.
- Expected placement: `out/linksets/linksets-stage-backfill.tar.zst`.
- Hash: record SHA-256 in this file once available (example below).
Example hash capture (replace with real):
```
$ sha256sum out/linksets/linksets-stage-backfill.tar.zst
3ac7d1c8f4f7b5c5b27c1c7ac6d6e9b2a2d6d7a1a1c3f4e5b6c7d8e9f0a1b2c3 out/linksets/linksets-stage-backfill.tar.zst
```
- Dataset tarball: `out/linksets/linksets-stage-backfill.tar.zst`
- Files expected inside: `linksets.ndjson`, `advisory_chunks.ndjson`, `manifest.json`
- Record SHA-256 of the tarball here when staged:
```
$ sha256sum out/linksets/linksets-stage-backfill.tar.zst
2b43ef9b5694f59be8c1d513893c506b8d1b8de152d820937178070bfc00d0c0 out/linksets/linksets-stage-backfill.tar.zst
```
- To regenerate the tarball deterministically from repo seeds: `./scripts/concelier/build-store-aoc-19-005-dataset.sh`
- To validate a tarball locally (counts + hashes): `./scripts/concelier/test-store-aoc-19-005-dataset.sh out/linksets/linksets-stage-backfill.tar.zst`
## Preflight
- Environment variables:
- `CONCELIER_MONGO_URI` pointing to the target (dev or staging) Mongo.
- `CONCELIER_DB` (default `concelier`).
- Take a snapshot of affected collections:
```
mongodump --uri "$CONCELIER_MONGO_URI" --db "$CONCELIER_DB" --collection linksets --collection advisory_chunks --out out/backups/pre-run
```
- Ensure write lock is acceptable for the maintenance window.
- Env:
- `PGURI` (or `CONCELIER_PG_URI`) pointing to the target Postgres instance.
- `PGSCHEMA` (default `lnm_raw`) for staging tables.
- Ensure maintenance window for bulk import; no concurrent writers to staging tables.
## Backfill steps
## Backfill steps (CI-ready)
### Preferred: CI/manual script
- `scripts/concelier/backfill-store-aoc-19-005.sh /path/to/linksets-stage-backfill.tar.zst`
- Env: `PGURI` (or `CONCELIER_PG_URI`), optional `PGSCHEMA` (default `lnm_raw`), optional `DRY_RUN=1` for extraction-only.
- The script:
- Extracts and validates required files.
- Creates/clears staging tables (`<schema>.linksets_raw`, `<schema>.advisory_chunks_raw`).
- Imports via `\copy` from TSV derived with `jq -rc '[._id, .] | @tsv'`.
- Prints counts and echoes the manifest.
### Manual steps (fallback)
1) Extract dataset:
```
mkdir -p out/linksets/extracted
tar -xf out/linksets/linksets-stage-backfill.tar.zst -C out/linksets/extracted
```
2) Import linksets + chunks (bypass validation to preserve upstream IDs):
2) Create/truncate staging tables and import:
```
mongoimport --uri "$CONCELIER_MONGO_URI" --db "$CONCELIER_DB" \
--collection linksets --file out/linksets/extracted/linksets.ndjson --mode=upsert --upsertFields=_id
mongoimport --uri "$CONCELIER_MONGO_URI" --db "$CONCELIER_DB" \
--collection advisory_chunks --file out/linksets/extracted/advisory_chunks.ndjson --mode=upsert --upsertFields=_id
psql "$PGURI" <<SQL
create schema if not exists lnm_raw;
create table if not exists lnm_raw.linksets_raw (id text primary key, raw jsonb not null);
create table if not exists lnm_raw.advisory_chunks_raw (id text primary key, raw jsonb not null);
truncate table lnm_raw.linksets_raw;
truncate table lnm_raw.advisory_chunks_raw;
\copy lnm_raw.linksets_raw (id, raw) from program 'jq -rc ''[._id, .] | @tsv'' out/linksets/extracted/linksets.ndjson' with (format csv, delimiter E'\\t', quote '\"', escape '\"');
\copy lnm_raw.advisory_chunks_raw (id, raw) from program 'jq -rc ''[._id, .] | @tsv'' out/linksets/extracted/advisory_chunks.ndjson' with (format csv, delimiter E'\\t', quote '\"', escape '\"');
SQL
```
3) Verify counts vs manifest:
```
jq '.' out/linksets/extracted/manifest.json
mongo --quiet "$CONCELIER_MONGO_URI/$CONCELIER_DB" --eval "db.linksets.countDocuments()"
mongo --quiet "$CONCELIER_MONGO_URI/$CONCELIER_DB" --eval "db.advisory_chunks.countDocuments()"
```
4) Dry-run rollback marker (no-op unless `ENABLE_ROLLBACK=1` set):
```
ENABLE_ROLLBACK=0 python scripts/concelier/backfill/rollback.py --manifest out/linksets/extracted/manifest.json
psql -tA "$PGURI" -c "select 'linksets_raw='||count(*) from lnm_raw.linksets_raw;"
psql -tA "$PGURI" -c "select 'advisory_chunks_raw='||count(*) from lnm_raw.advisory_chunks_raw;"
```
## Rollback procedure
- If validation fails, restore from preflight dump:
```
mongorestore --uri "$CONCELIER_MONGO_URI" --drop out/backups/pre-run
```
- If partial write detected, rerun mongoimport for the affected collection only with `--mode=upsert`.
- If validation fails: `truncate table lnm_raw.linksets_raw; truncate table lnm_raw.advisory_chunks_raw;` then rerun import.
- Promotion to production tables should be gated by a separate migration/ETL step; keep staging isolated.
## Validation checklist
- Hash of tarball matches recorded SHA-256.
- Post-import counts align with `manifest.json`.
- Linkset cursor pagination smoke test:
```
dotnet test src/Concelier/StellaOps.Concelier.WebService.Tests --filter LinksetsEndpoint_SupportsCursorPagination
```
- Storage metrics (if enabled) show non-zero `concelier_storage_import_total` for this window.
- Tarball SHA-256 recorded above.
- Counts align with `manifest.json`.
- API smoke test (Postgres-backed): `dotnet test src/Concelier/StellaOps.Concelier.WebService.Tests --filter LinksetsEndpoint_SupportsCursorPagination` (against Postgres config).
- Optional: compare sample rows between staging and expected downstream tables.
## Artefacts to record
- Tarball SHA-256 and size.
- `manifest.json` copy stored alongside tarball.
- Import log (`out/linksets/import.log`) and validation results.
- `manifest.json` copy alongside tarball.
- Import log (capture script output) and validation results.
- Decision: maintenance window and rollback outcome.
## How to produce the tarball (export from Postgres)
- Use `scripts/concelier/export-linksets-tarball.sh out/linksets/linksets-stage-backfill.tar.zst`.
- Env: `PGURI` (or `CONCELIER_PG_URI`), optional `PGSCHEMA`, `LINKSETS_TABLE`, `CHUNKS_TABLE`.
- The script exports `linksets` and `advisory_chunks` tables to NDJSON, generates `manifest.json`, builds the tarball, and prints the SHA-256.
## Owners
- Concelier Storage Guild (Mongo)
- Concelier Storage Guild (Postgres)
- AirGap/Backfill reviewers for sign-off