Add Canonical JSON serialization library with tests and documentation

- Implemented CanonJson class for deterministic JSON serialization and hashing.
- Added unit tests for CanonJson functionality, covering various scenarios including key sorting, handling of nested objects, arrays, and special characters.
- Created project files for the Canonical JSON library and its tests, including necessary package references.
- Added README.md for library usage and API reference.
- Introduced RabbitMqIntegrationFactAttribute for conditional RabbitMQ integration tests.
This commit is contained in:
master
2025-12-19 15:35:00 +02:00
parent 43882078a4
commit 951a38d561
192 changed files with 27550 additions and 2611 deletions

117
bench/smart-diff/README.md Normal file
View File

@@ -0,0 +1,117 @@
# Smart-Diff Benchmark Suite
> **Purpose:** Prove deterministic smart-diff reduces noise compared to naive diff.
> **Status:** Active
> **Sprint:** SPRINT_3850_0001_0001 (Competitive Gap Closure)
## Overview
The Smart-Diff feature enables incremental scanning by:
1. Computing structural diffs of SBOMs/dependencies
2. Identifying only changed components
3. Avoiding redundant scanning of unchanged packages
4. Producing deterministic, reproducible diff results
## Test Cases
### TC-001: Layer-Aware Diff
Tests that Smart-Diff correctly handles container layer changes:
- Adding a layer
- Removing a layer
- Modifying a layer (same hash, different content)
### TC-002: Package Version Diff
Tests accurate detection of package version changes:
- Minor version bump
- Major version bump
- Pre-release version handling
- Epoch handling (RPM)
### TC-003: Noise Reduction
Compares smart-diff output vs naive diff for real-world images:
- Measure CVE count reduction
- Measure scanning time reduction
- Verify determinism (same inputs → same outputs)
### TC-004: Deterministic Ordering
Verifies that diff results are:
- Sorted by component PURL
- Ordered consistently across runs
- Independent of filesystem ordering
## Fixtures
```
fixtures/
├── base-alpine-3.18.sbom.cdx.json
├── base-alpine-3.19.sbom.cdx.json
├── layer-added.manifest.json
├── layer-removed.manifest.json
├── version-bump-minor.sbom.cdx.json
├── version-bump-major.sbom.cdx.json
└── expected/
├── tc001-layer-added.diff.json
├── tc001-layer-removed.diff.json
├── tc002-minor-bump.diff.json
├── tc002-major-bump.diff.json
└── tc003-noise-reduction.metrics.json
```
## Running the Suite
```bash
# Run all smart-diff tests
dotnet test tests/StellaOps.Scanner.SmartDiff.Tests
# Run benchmark comparison
./run-benchmark.sh --baseline naive --compare smart
# Generate metrics report
./tools/analyze.py results/ --output metrics.csv
```
## Metrics Collected
| Metric | Description |
|--------|-------------|
| `diff_time_ms` | Time to compute diff |
| `changed_packages` | Number of packages marked as changed |
| `false_positive_rate` | Packages incorrectly flagged as changed |
| `determinism_score` | 1.0 if all runs produce identical output |
| `noise_reduction_pct` | % reduction vs naive diff |
## Expected Results
For typical Alpine base image upgrades (3.18 → 3.19):
- **Naive diff:** ~150 packages flagged as changed
- **Smart diff:** ~12 packages actually changed
- **Noise reduction:** ~92%
## Integration with CI
```yaml
# .gitea/workflows/bench-smart-diff.yaml
name: Smart-Diff Benchmark
on:
push:
paths:
- 'src/Scanner/__Libraries/StellaOps.Scanner.SmartDiff/**'
- 'bench/smart-diff/**'
jobs:
benchmark:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- name: Run Smart-Diff Benchmark
run: ./bench/smart-diff/run-benchmark.sh
- name: Upload Results
uses: actions/upload-artifact@v4
with:
name: smart-diff-results
path: bench/smart-diff/results/
```

View File

@@ -0,0 +1,135 @@
#!/usr/bin/env bash
# run-benchmark.sh
# Smart-Diff Benchmark Runner
# Sprint: SPRINT_3850_0001_0001
set -euo pipefail
SCRIPT_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
BENCH_ROOT="$SCRIPT_DIR"
RESULTS_DIR="$BENCH_ROOT/results/$(date -u +%Y%m%d_%H%M%S)"
# Parse arguments
BASELINE_MODE="naive"
COMPARE_MODE="smart"
VERBOSE=false
while [[ $# -gt 0 ]]; do
case $1 in
--baseline)
BASELINE_MODE="$2"
shift 2
;;
--compare)
COMPARE_MODE="$2"
shift 2
;;
--verbose|-v)
VERBOSE=true
shift
;;
*)
echo "Unknown option: $1"
exit 1
;;
esac
done
echo "╔════════════════════════════════════════════════╗"
echo "║ Smart-Diff Benchmark Suite ║"
echo "╚════════════════════════════════════════════════╝"
echo ""
echo "Configuration:"
echo " Baseline mode: $BASELINE_MODE"
echo " Compare mode: $COMPARE_MODE"
echo " Results dir: $RESULTS_DIR"
echo ""
mkdir -p "$RESULTS_DIR"
# Function to run a test case
run_test_case() {
local test_id="$1"
local description="$2"
local base_sbom="$3"
local target_sbom="$4"
local expected_file="$5"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Test: $test_id - $description"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
local start_time=$(date +%s%3N)
# Run smart-diff
if command -v dotnet &> /dev/null; then
dotnet run --project "$SCRIPT_DIR/../../src/Scanner/__Libraries/StellaOps.Scanner.SmartDiff" -- \
--base "$base_sbom" \
--target "$target_sbom" \
--output "$RESULTS_DIR/$test_id.diff.json" \
--format json 2>/dev/null || true
fi
local end_time=$(date +%s%3N)
local elapsed=$((end_time - start_time))
echo " Time: ${elapsed}ms"
# Verify determinism by running twice
if [ -f "$RESULTS_DIR/$test_id.diff.json" ]; then
local hash1=$(sha256sum "$RESULTS_DIR/$test_id.diff.json" | cut -d' ' -f1)
if command -v dotnet &> /dev/null; then
dotnet run --project "$SCRIPT_DIR/../../src/Scanner/__Libraries/StellaOps.Scanner.SmartDiff" -- \
--base "$base_sbom" \
--target "$target_sbom" \
--output "$RESULTS_DIR/$test_id.diff.run2.json" \
--format json 2>/dev/null || true
fi
if [ -f "$RESULTS_DIR/$test_id.diff.run2.json" ]; then
local hash2=$(sha256sum "$RESULTS_DIR/$test_id.diff.run2.json" | cut -d' ' -f1)
if [ "$hash1" = "$hash2" ]; then
echo " ✓ Determinism verified"
else
echo " ✗ Determinism FAILED (different hashes)"
fi
fi
else
echo " ⊘ Skipped (dotnet not available or project missing)"
fi
echo ""
}
# Test Case 1: Layer-Aware Diff (using fixtures)
if [ -f "$BENCH_ROOT/fixtures/base-alpine-3.18.sbom.cdx.json" ]; then
run_test_case "TC-001-layer-added" \
"Layer addition detection" \
"$BENCH_ROOT/fixtures/base-alpine-3.18.sbom.cdx.json" \
"$BENCH_ROOT/fixtures/base-alpine-3.19.sbom.cdx.json" \
"$BENCH_ROOT/fixtures/expected/tc001-layer-added.diff.json"
else
echo " Skipping TC-001: Fixtures not found"
echo " Run './tools/generate-fixtures.sh' to create test fixtures"
fi
# Generate summary
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Summary"
echo "━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━━"
echo "Results saved to: $RESULTS_DIR"
# Create summary JSON
cat > "$RESULTS_DIR/summary.json" <<EOF
{
"benchmark": "smart-diff",
"timestamp": "$(date -u +%Y-%m-%dT%H:%M:%SZ)",
"baseline_mode": "$BASELINE_MODE",
"compare_mode": "$COMPARE_MODE",
"results_dir": "$RESULTS_DIR"
}
EOF
echo "Done."