feat: Add RustFS artifact object store and migration tool

- Implemented RustFsArtifactObjectStore for managing artifacts in RustFS.
- Added unit tests for RustFsArtifactObjectStore functionality.
- Created a RustFS migrator tool to transfer objects from S3 to RustFS.
- Introduced policy preview and report models for API integration.
- Added fixtures and tests for policy preview and report functionality.
- Included necessary metadata and scripts for cache_pkg package.
This commit is contained in:
master
2025-10-23 18:53:18 +03:00
parent 5cb3144e5e
commit 70d7fb529e
117 changed files with 4849 additions and 725 deletions

View File

@@ -0,0 +1,155 @@
{
"title": "StellaOps Scanner Analyzer Benchmarks",
"uid": "scanner-analyzer-bench",
"schemaVersion": 38,
"version": 1,
"editable": true,
"timezone": "",
"graphTooltip": 0,
"time": {
"from": "now-24h",
"to": "now"
},
"templating": {
"list": [
{
"name": "datasource",
"type": "datasource",
"query": "prometheus",
"refresh": 1,
"hide": 0,
"current": {}
}
]
},
"annotations": {
"list": []
},
"panels": [
{
"id": 1,
"title": "Max Duration (ms)",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"unit": "ms",
"displayName": "{{scenario}}"
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "single",
"sort": "none"
}
},
"targets": [
{
"expr": "scanner_analyzer_bench_max_ms",
"legendFormat": "{{scenario}}",
"refId": "A"
},
{
"expr": "scanner_analyzer_bench_baseline_max_ms",
"legendFormat": "{{scenario}} baseline",
"refId": "B"
}
]
},
{
"id": 2,
"title": "Regression Ratio vs Limit",
"type": "timeseries",
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"unit": "percentunit",
"displayName": "{{scenario}}",
"min": 0,
"thresholds": {
"mode": "absolute",
"steps": [
{
"color": "green",
"value": null
},
{
"color": "red",
"value": 20
}
]
}
},
"overrides": []
},
"options": {
"legend": {
"displayMode": "table",
"placement": "bottom"
},
"tooltip": {
"mode": "multi",
"sort": "none"
}
},
"targets": [
{
"expr": "(scanner_analyzer_bench_regression_ratio - 1) * 100",
"legendFormat": "{{scenario}} regression %",
"refId": "A"
},
{
"expr": "(scanner_analyzer_bench_regression_limit - 1) * 100",
"legendFormat": "{{scenario}} limit %",
"refId": "B"
}
]
},
{
"id": 3,
"title": "Breached Scenarios",
"type": "stat",
"datasource": {
"type": "prometheus",
"uid": "${datasource}"
},
"fieldConfig": {
"defaults": {
"displayName": "{{scenario}}",
"unit": "short"
},
"overrides": []
},
"options": {
"colorMode": "value",
"graphMode": "area",
"justifyMode": "center",
"reduceOptions": {
"calcs": [
"last"
],
"fields": "",
"values": false
}
},
"targets": [
{
"expr": "scanner_analyzer_bench_regression_breached",
"legendFormat": "{{scenario}}",
"refId": "A"
}
]
}
]
}

View File

@@ -0,0 +1,48 @@
# Scanner Analyzer Benchmarks Operations Guide
## Purpose
Keep the language analyzer microbench under the <5s SBOM pledge. CI emits Prometheus metrics and JSON fixtures so trend dashboards and alerts stay in lockstep with the repository baseline.
> **Grafana note:** Import `docs/ops/scanner-analyzers-grafana-dashboard.json` into your Prometheus-backed Grafana stack to monitor `scanner_analyzer_bench_*` metrics and alert on regressions.
## Publishing workflow
1. CI (or engineers running locally) execute:
```bash
dotnet run \
--project bench/Scanner.Analyzers/StellaOps.Bench.ScannerAnalyzers/StellaOps.Bench.ScannerAnalyzers.csproj \
-- \
--repo-root . \
--out bench/Scanner.Analyzers/baseline.csv \
--json out/bench/scanner-analyzers/latest.json \
--prom out/bench/scanner-analyzers/latest.prom \
--commit "$(git rev-parse HEAD)" \
--environment "${CI_ENVIRONMENT_NAME:-local}"
```
2. Publish the artefacts (`baseline.csv`, `latest.json`, `latest.prom`) to `bench-artifacts/<date>/`.
3. Promtail (or the CI job) pushes `latest.prom` into Prometheus; JSON lands in long-term storage for workbook snapshots.
4. The harness exits non-zero if:
- `max_ms` for any scenario breaches its configured threshold; or
- `max_ms` regresses ≥20% versus `baseline.csv`.
## Grafana dashboard
- Import `docs/ops/scanner-analyzers-grafana-dashboard.json`.
- Point the template variable `datasource` to the Prometheus instance ingesting `scanner_analyzer_bench_*` metrics.
- Panels:
- **Max Duration (ms)** compares live runs vs baseline.
- **Regression Ratio vs Limit** plots `(max / baseline_max - 1) * 100`.
- **Breached Scenarios** stat panel sourced from `scanner_analyzer_bench_regression_breached`.
## Alerting & on-call response
- **Primary alert**: fire when `scanner_analyzer_bench_regression_ratio{scenario=~".+"} >= 1.20` for 2 consecutive samples (10min default). Suggested PromQL:
```
max_over_time(scanner_analyzer_bench_regression_ratio[10m]) >= 1.20
```
- Suppress duplicates using the `scenario` label.
- Pager payload should include `scenario`, `max_ms`, `baseline_max_ms`, and `commit`.
- Immediate triage steps:
1. Check `latest.json` artefact for the failing scenario confirm commit and environment.
2. Re-run the harness with `--captured-at` and `--baseline` pointing at the last known good CSV to verify determinism.
3. If regression persists, open an incident ticket tagged `scanner-analyzer-perf` and page the owning language guild.
4. Roll back the offending change or update the baseline after sign-off from the guild lead and Perf captain.
Document the outcome in `docs/12_PERFORMANCE_WORKBOOK.md` (section 8) so trendlines reflect any accepted regressions.

View File

@@ -0,0 +1,88 @@
# Scanner Artifact Store Migration (MinIO → RustFS)
## Overview
Sprint 11 introduces **RustFS** as the default artifact store for the Scanner plane. Existing
deployments running MinIO (or any S3-compatible backend) must migrate stored SBOM artefacts to RustFS
before switching the Scanner hosts to `scanner.artifactStore.driver = "rustfs"`.
This runbook covers the recommended migration workflow and validation steps.
## Prerequisites
- RustFS service deployed and reachable from the Scanner control plane (`http(s)://rustfs:8080`).
- Existing MinIO/S3 credentials with read access to the current bucket.
- CLI environment with the StellaOps source tree (for the migration tool) and `dotnet 10` SDK.
- Maintenance window sized to copy all artefacts (migration is read-only on the source bucket).
## 1. Snapshot source bucket (optional but recommended)
If the MinIO deployment offers versioning or snapshots, take one before migrating. For non-versioned
deployments, capture an external backup (e.g., `mc mirror` to offline storage).
## 2. Dry-run the migrator
```
dotnet run --project tools/RustFsMigrator -- \
--s3-bucket scanner-artifacts \
--s3-endpoint http://stellaops-minio:9000 \
--s3-access-key stellaops \
--s3-secret-key dev-minio-secret \
--rustfs-endpoint http://stellaops-rustfs:8080 \
--rustfs-bucket scanner-artifacts \
--prefix scanner/ \
--dry-run
```
The dry-run enumerates keys and reports the object count without writing to RustFS. Use this to
estimate migration time.
## 3. Execute migration
Remove the `--dry-run` flag to copy data. Optional flags:
- `--immutable` mark all migrated objects as immutable (`X-RustFS-Immutable`).
- `--retain-days 365` request retention (in days) via `X-RustFS-Retain-Seconds`.
- `--rustfs-api-key-header` / `--rustfs-api-key` provide auth headers when RustFS is protected.
The tool streams each object from S3 and performs an idempotent `PUT` to RustFS preserving the key
structure (e.g., `scanner/layers/<sha256>/sbom.cdx.json.zst`).
## 4. Verify sample objects
Pick a handful of SBOM digests and confirm:
1. `GET /api/v1/buckets/<bucket>/objects/<key>` returns the expected payload (size + SHA-256).
2. Scanner WebService configured with `scanner.artifactStore.driver = "rustfs"` can fetch the same
artefacts (Smoke test: `GET /api/v1/scanner/sboms/<digest>?format=cdx-json`).
## 5. Switch Scanner hosts
Update configuration (Helm/Compose/environment) to set:
```
scanner:
artifactStore:
driver: rustfs
endpoint: http://stellaops-rustfs:8080
bucket: scanner-artifacts
timeoutSeconds: 30
```
Redeploy Scanner WebService and Worker. Monitor logs for `RustFS` upload/download messages and
Prometheus scrape (`rustfs_requests_total`).
## 6. Cleanup legacy MinIO (optional)
After a complete migration and validation period, decommission the MinIO bucket or repurpose it for
other components (Concelier still supports S3). Ensure backups reference RustFS snapshots going
forward.
## Troubleshooting
- **Uploads fail (HTTP 4xx/5xx):** Check RustFS logs and confirm API key headers. Re-run the migrator
for the affected keys.
- **Missing objects post-cutover:** Re-run the migrator with the specific `--prefix`. The tool is
idempotent and safely overwrites existing objects.
- **Performance tuning:** Run multiple instances of the migrator with disjoint prefixes if needed; the
RustFS API is stateless and supports parallel PUTs.