Files
git.stella-ops.org/bench/smart-diff/README.md
master 951a38d561 Add Canonical JSON serialization library with tests and documentation
- Implemented CanonJson class for deterministic JSON serialization and hashing.
- Added unit tests for CanonJson functionality, covering various scenarios including key sorting, handling of nested objects, arrays, and special characters.
- Created project files for the Canonical JSON library and its tests, including necessary package references.
- Added README.md for library usage and API reference.
- Introduced RabbitMqIntegrationFactAttribute for conditional RabbitMQ integration tests.
2025-12-19 15:35:00 +02:00

3.1 KiB

Smart-Diff Benchmark Suite

Purpose: Prove deterministic smart-diff reduces noise compared to naive diff. Status: Active Sprint: SPRINT_3850_0001_0001 (Competitive Gap Closure)

Overview

The Smart-Diff feature enables incremental scanning by:

  1. Computing structural diffs of SBOMs/dependencies
  2. Identifying only changed components
  3. Avoiding redundant scanning of unchanged packages
  4. Producing deterministic, reproducible diff results

Test Cases

TC-001: Layer-Aware Diff

Tests that Smart-Diff correctly handles container layer changes:

  • Adding a layer
  • Removing a layer
  • Modifying a layer (same hash, different content)

TC-002: Package Version Diff

Tests accurate detection of package version changes:

  • Minor version bump
  • Major version bump
  • Pre-release version handling
  • Epoch handling (RPM)

TC-003: Noise Reduction

Compares smart-diff output vs naive diff for real-world images:

  • Measure CVE count reduction
  • Measure scanning time reduction
  • Verify determinism (same inputs → same outputs)

TC-004: Deterministic Ordering

Verifies that diff results are:

  • Sorted by component PURL
  • Ordered consistently across runs
  • Independent of filesystem ordering

Fixtures

fixtures/
├── base-alpine-3.18.sbom.cdx.json
├── base-alpine-3.19.sbom.cdx.json
├── layer-added.manifest.json
├── layer-removed.manifest.json
├── version-bump-minor.sbom.cdx.json
├── version-bump-major.sbom.cdx.json
└── expected/
    ├── tc001-layer-added.diff.json
    ├── tc001-layer-removed.diff.json
    ├── tc002-minor-bump.diff.json
    ├── tc002-major-bump.diff.json
    └── tc003-noise-reduction.metrics.json

Running the Suite

# Run all smart-diff tests
dotnet test tests/StellaOps.Scanner.SmartDiff.Tests

# Run benchmark comparison
./run-benchmark.sh --baseline naive --compare smart

# Generate metrics report
./tools/analyze.py results/ --output metrics.csv

Metrics Collected

Metric Description
diff_time_ms Time to compute diff
changed_packages Number of packages marked as changed
false_positive_rate Packages incorrectly flagged as changed
determinism_score 1.0 if all runs produce identical output
noise_reduction_pct % reduction vs naive diff

Expected Results

For typical Alpine base image upgrades (3.18 → 3.19):

  • Naive diff: ~150 packages flagged as changed
  • Smart diff: ~12 packages actually changed
  • Noise reduction: ~92%

Integration with CI

# .gitea/workflows/bench-smart-diff.yaml
name: Smart-Diff Benchmark
on:
  push:
    paths:
      - 'src/Scanner/__Libraries/StellaOps.Scanner.SmartDiff/**'
      - 'bench/smart-diff/**'

jobs:
  benchmark:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - name: Run Smart-Diff Benchmark
        run: ./bench/smart-diff/run-benchmark.sh
      - name: Upload Results
        uses: actions/upload-artifact@v4
        with:
          name: smart-diff-results
          path: bench/smart-diff/results/