audit work, fixed StellaOps.sln warnings/errors, fixed tests, sprints work, new advisories
This commit is contained in:
58
docs/dev/contributing/api-contracts.md
Normal file
58
docs/dev/contributing/api-contracts.md
Normal file
@@ -0,0 +1,58 @@
|
||||
# Contributing to API Contracts
|
||||
|
||||
Last updated: 2025-11-25
|
||||
|
||||
## Scope
|
||||
Guidelines for editing service OpenAPI specs, lint rules, compatibility checks, and release artefacts across StellaOps services.
|
||||
|
||||
## Required tools
|
||||
- Node.js 20.x + pnpm 9.x
|
||||
- Spectral CLI (invoked via `pnpm api:lint` in `src/Api/StellaOps.Api.OpenApi`)
|
||||
- `diff2html` (optional) for human-friendly compat reports
|
||||
|
||||
## Workflow (per change)
|
||||
1) Edit service OAS under `src/Api/StellaOps.Api.OpenApi/services/<service>/*.yaml`.
|
||||
2) Run lint + compose + compat + changelog from repo root:
|
||||
```bash
|
||||
pnpm --filter @stella/api-openapi api:lint # spectral
|
||||
pnpm --filter @stella/api-openapi api:compose # build stella.yaml
|
||||
pnpm --filter @stella/api-openapi api:compat # compare against baseline
|
||||
pnpm --filter @stella/api-openapi api:changelog # generate digest/signature
|
||||
```
|
||||
3) Review outputs:
|
||||
- `out/api/stella.yaml` (composed spec)
|
||||
- `out/api/compat-report.json` (+ `.html` if generated)
|
||||
- `out/api/changelog/` (digest + signature for SDK pipeline)
|
||||
4) Update examples: ensure request/response examples exist for changed endpoints; add to `examples/` directories.
|
||||
5) Commit changes with notes on breaking/additive results; attach compat report paths.
|
||||
|
||||
## Lint rules (Spectral)
|
||||
- House rules live in `.spectral.yaml` under `src/Api/StellaOps.Api.OpenApi`.
|
||||
- Enforce: tagged operations, error envelope shape, schema refs reuse, pagination tokens, RBAC scopes, standard headers (`traceparent`, `x-correlation-id`).
|
||||
|
||||
## Compatibility checks
|
||||
- Baseline file: `stella-baseline.yaml` (kept in repo under `out/api/` and updated per release).
|
||||
- `api:compat` flags additive/breaking/unchanged deltas. Breaking changes require approval from API Governance Guild and affected service guilds.
|
||||
|
||||
## Examples & error envelopes
|
||||
- Every operation must have at least one request + response example.
|
||||
- Error responses must use the standard envelope (`error.code`, `error.message`, `trace_id`).
|
||||
|
||||
## Offline/air-gap
|
||||
- Keep `pnpm-lock.yaml` pinned; store `node_modules/.pnpm` cache in Offline Kit when needed.
|
||||
- Include composed spec, compat report, and changelog artefacts in offline bundles under `offline/api/` with checksums.
|
||||
|
||||
## Release handoff
|
||||
- Deliverables for each release:
|
||||
- `out/api/stella.yaml`
|
||||
- `out/api/compat-report.json` (+ `.html` if produced)
|
||||
- `out/api/changelog/*` (digest, signature, manifest)
|
||||
- Provide SHA256 checksums for artefacts and note `source_commit` + `timestamp` in release notes.
|
||||
|
||||
## Review checklist
|
||||
- [ ] Lint clean (`api:lint`)
|
||||
- [ ] Examples present for changed endpoints
|
||||
- [ ] Compat report reviewed (additive/breaking noted)
|
||||
- [ ] Changelog artefacts generated + checksums
|
||||
- [ ] Offline bundle artefacts staged (if required)
|
||||
- [ ] Docs/UI/SDK owners notified of breaking changes
|
||||
336
docs/dev/contributing/canonicalization-determinism.md
Normal file
336
docs/dev/contributing/canonicalization-determinism.md
Normal file
@@ -0,0 +1,336 @@
|
||||
# Canonicalization & Determinism Patterns
|
||||
|
||||
**Version:** 1.0
|
||||
**Date:** December 2025
|
||||
**Sprint:** SPRINT_20251226_007_BE_determinism_gaps (DET-GAP-20)
|
||||
|
||||
> **Audience:** All StellaOps contributors working on code that produces digests, attestations, or replayable outputs.
|
||||
> **Goal:** Ensure byte-identical outputs for identical inputs across platforms, time, and Rust/Go/Node re-implementations.
|
||||
|
||||
---
|
||||
|
||||
## 1. Why Determinism Matters
|
||||
|
||||
StellaOps is built on **proof-of-state**: every verdict, attestation, and replay must be reproducible. Non-determinism breaks:
|
||||
|
||||
- **Signature verification:** Different serialization → different digest → invalid signature.
|
||||
- **Replay guarantees:** Feed snapshots that produce different hashes cannot be replayed.
|
||||
- **Audit trails:** Compliance teams require bit-exact reproduction of historical scans.
|
||||
- **Cross-platform compatibility:** Windows/Linux/macOS must produce identical outputs.
|
||||
|
||||
---
|
||||
|
||||
## 2. RFC 8785 JSON Canonicalization Scheme (JCS)
|
||||
|
||||
All JSON that participates in digest computation **must** use RFC 8785 JCS. This includes:
|
||||
|
||||
- Attestation payloads (DSSE)
|
||||
- Verdict JSON
|
||||
- Policy evaluation results
|
||||
- Feed snapshot manifests
|
||||
- Proof bundles
|
||||
|
||||
### 2.1 The Rfc8785JsonCanonicalizer
|
||||
|
||||
Use the `Rfc8785JsonCanonicalizer` class for all canonical JSON operations:
|
||||
|
||||
```csharp
|
||||
using StellaOps.Attestor.ProofChain.Json;
|
||||
|
||||
// Create canonicalizer (optionally with NFC normalization)
|
||||
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
|
||||
|
||||
// Canonicalize JSON
|
||||
string canonical = canonicalizer.Canonicalize(jsonString);
|
||||
|
||||
// Or from JsonElement
|
||||
string canonical = canonicalizer.Canonicalize(jsonElement);
|
||||
```
|
||||
|
||||
### 2.2 JCS Rules Summary
|
||||
|
||||
RFC 8785 requires:
|
||||
|
||||
1. **No whitespace** between tokens.
|
||||
2. **Lexicographic key ordering** within objects.
|
||||
3. **Number serialization:** No leading zeros, no trailing zeros after decimal, integers without decimal point.
|
||||
4. **String escaping:** Minimal escaping (only `"`, `\`, and control chars).
|
||||
5. **UTF-8 encoding** without BOM.
|
||||
|
||||
### 2.3 Common Mistakes
|
||||
|
||||
❌ **Wrong:** Using `JsonSerializer.Serialize()` directly for digest input.
|
||||
|
||||
```csharp
|
||||
// WRONG - non-deterministic ordering
|
||||
var json = JsonSerializer.Serialize(obj);
|
||||
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(json));
|
||||
```
|
||||
|
||||
✅ **Correct:** Canonicalize before hashing.
|
||||
|
||||
```csharp
|
||||
// CORRECT - deterministic
|
||||
var canonicalizer = new Rfc8785JsonCanonicalizer();
|
||||
var canonical = canonicalizer.Canonicalize(obj);
|
||||
var hash = SHA256.HashData(Encoding.UTF8.GetBytes(canonical));
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 3. Unicode NFC Normalization
|
||||
|
||||
Different platforms may store the same string in different Unicode normalization forms. Enable NFC normalization when:
|
||||
|
||||
- Processing user-supplied strings
|
||||
- Aggregating data from multiple sources
|
||||
- Working with file paths or identifiers from different systems
|
||||
|
||||
```csharp
|
||||
// Enable NFC for cross-platform string stability
|
||||
var canonicalizer = new Rfc8785JsonCanonicalizer(enableNfcNormalization: true);
|
||||
```
|
||||
|
||||
When NFC is enabled, all strings are normalized via `string.Normalize(NormalizationForm.FormC)` before serialization.
|
||||
|
||||
---
|
||||
|
||||
## 4. Resolver Boundary Pattern
|
||||
|
||||
**Key principle:** All data entering or leaving a "resolver" (a service that produces verdicts, attestations, or replayable state) must be canonicalized.
|
||||
|
||||
### 4.1 What Is a Resolver Boundary?
|
||||
|
||||
A resolver boundary is any point where:
|
||||
|
||||
- Data is **serialized** for storage, transmission, or signing
|
||||
- Data is **hashed** to produce a digest
|
||||
- Data is **compared** for equality in replay validation
|
||||
|
||||
### 4.2 Boundary Enforcement
|
||||
|
||||
At resolver boundaries:
|
||||
|
||||
1. **Canonicalize** all JSON payloads using `Rfc8785JsonCanonicalizer`.
|
||||
2. **Sort** collections deterministically (alphabetically by key or ID).
|
||||
3. **Normalize** timestamps to ISO 8601 UTC with `Z` suffix.
|
||||
4. **Freeze** dictionaries using `FrozenDictionary` for stable iteration order.
|
||||
|
||||
### 4.3 Example: Feed Snapshot Coordinator
|
||||
|
||||
```csharp
|
||||
public sealed class FeedSnapshotCoordinatorService : IFeedSnapshotCoordinator
|
||||
{
|
||||
private readonly FrozenDictionary<string, IFeedSourceProvider> _providers;
|
||||
|
||||
public FeedSnapshotCoordinatorService(IEnumerable<IFeedSourceProvider> providers, ...)
|
||||
{
|
||||
// Sort providers alphabetically for deterministic digest computation
|
||||
_providers = providers
|
||||
.OrderBy(p => p.SourceId, StringComparer.Ordinal)
|
||||
.ToFrozenDictionary(p => p.SourceId, p => p, StringComparer.OrdinalIgnoreCase);
|
||||
}
|
||||
|
||||
private string ComputeCompositeDigest(IReadOnlyList<SourceSnapshot> sources)
|
||||
{
|
||||
// Sources are already sorted by SourceId (alphabetically)
|
||||
using var sha256 = SHA256.Create();
|
||||
foreach (var source in sources.OrderBy(s => s.SourceId, StringComparer.Ordinal))
|
||||
{
|
||||
// Append each source digest to the hash computation
|
||||
var digestBytes = Encoding.UTF8.GetBytes(source.Digest);
|
||||
sha256.TransformBlock(digestBytes, 0, digestBytes.Length, null, 0);
|
||||
}
|
||||
sha256.TransformFinalBlock([], 0, 0);
|
||||
return $"sha256:{Convert.ToHexString(sha256.Hash!).ToLowerInvariant()}";
|
||||
}
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 5. Timestamp Handling
|
||||
|
||||
### 5.1 Rules
|
||||
|
||||
1. **Always use UTC** - never local time.
|
||||
2. **ISO 8601 format** with `Z` suffix: `2025-12-27T14:30:00Z`
|
||||
3. **Consistent precision** - truncate to seconds unless milliseconds are required.
|
||||
4. **Use TimeProvider** for testability.
|
||||
|
||||
### 5.2 Example
|
||||
|
||||
```csharp
|
||||
// CORRECT - UTC with Z suffix
|
||||
var timestamp = timeProvider.GetUtcNow().ToString("yyyy-MM-ddTHH:mm:ssZ");
|
||||
|
||||
// WRONG - local time
|
||||
var wrong = DateTime.Now.ToString("o");
|
||||
|
||||
// WRONG - inconsistent format
|
||||
var wrong2 = DateTimeOffset.UtcNow.ToString();
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 6. Numeric Stability
|
||||
|
||||
### 6.1 Avoid Floating Point for Determinism
|
||||
|
||||
Floating-point arithmetic can produce different results on different platforms. For deterministic values:
|
||||
|
||||
- Use `decimal` for scores, percentages, and monetary values.
|
||||
- Use `int` or `long` for counts and identifiers.
|
||||
- If floating-point is unavoidable, document the acceptable epsilon and rounding rules.
|
||||
|
||||
### 6.2 Number Serialization
|
||||
|
||||
RFC 8785 requires specific number formatting:
|
||||
|
||||
- Integers: no decimal point (`42`, not `42.0`)
|
||||
- Decimals: no trailing zeros (`3.14`, not `3.140`)
|
||||
- No leading zeros (`0.5`, not `00.5`)
|
||||
|
||||
The `Rfc8785JsonCanonicalizer` handles this automatically.
|
||||
|
||||
---
|
||||
|
||||
## 7. Collection Ordering
|
||||
|
||||
### 7.1 Rule
|
||||
|
||||
All collections that participate in digest computation must have **deterministic order**.
|
||||
|
||||
### 7.2 Implementation
|
||||
|
||||
```csharp
|
||||
// CORRECT - use FrozenDictionary for stable iteration
|
||||
var orderedDict = items
|
||||
.OrderBy(x => x.Key, StringComparer.Ordinal)
|
||||
.ToFrozenDictionary(x => x.Key, x => x.Value);
|
||||
|
||||
// CORRECT - sort before iteration
|
||||
foreach (var item in items.OrderBy(x => x.Id, StringComparer.Ordinal))
|
||||
{
|
||||
// ...
|
||||
}
|
||||
|
||||
// WRONG - iteration order is undefined
|
||||
foreach (var item in dictionary)
|
||||
{
|
||||
// Order may vary between runs
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 8. Audit Hash Logging
|
||||
|
||||
For debugging determinism issues, use the `AuditHashLogger`:
|
||||
|
||||
```csharp
|
||||
using StellaOps.Attestor.ProofChain.Audit;
|
||||
|
||||
var auditLogger = new AuditHashLogger(logger);
|
||||
|
||||
// Log both raw and canonical hashes
|
||||
auditLogger.LogHashAudit(
|
||||
rawContent,
|
||||
canonicalContent,
|
||||
"sha256:abc...",
|
||||
"verdict",
|
||||
"scan-123",
|
||||
metadata);
|
||||
```
|
||||
|
||||
This enables post-mortem analysis of canonicalization issues.
|
||||
|
||||
---
|
||||
|
||||
## 9. Testing Determinism
|
||||
|
||||
### 9.1 Required Tests
|
||||
|
||||
Every component that produces digests must have tests verifying:
|
||||
|
||||
1. **Idempotency:** Same input → same digest (multiple calls).
|
||||
2. **Permutation invariance:** Reordering input collections → same digest.
|
||||
3. **Cross-platform:** Windows/Linux/macOS produce identical outputs.
|
||||
|
||||
### 9.2 Example Test
|
||||
|
||||
```csharp
|
||||
[Fact]
|
||||
public async Task CreateSnapshot_ProducesDeterministicDigest()
|
||||
{
|
||||
// Arrange
|
||||
var sources = CreateTestSources();
|
||||
|
||||
// Act - create multiple snapshots with same data
|
||||
var bundle1 = await coordinator.CreateSnapshotAsync();
|
||||
var bundle2 = await coordinator.CreateSnapshotAsync();
|
||||
|
||||
// Assert - digests must be identical
|
||||
Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
|
||||
}
|
||||
|
||||
[Fact]
|
||||
public async Task CreateSnapshot_OrderIndependent()
|
||||
{
|
||||
// Arrange - sources in different orders
|
||||
var sourcesAscending = sources.OrderBy(s => s.Id);
|
||||
var sourcesDescending = sources.OrderByDescending(s => s.Id);
|
||||
|
||||
// Act
|
||||
var bundle1 = await CreateWithSources(sourcesAscending);
|
||||
var bundle2 = await CreateWithSources(sourcesDescending);
|
||||
|
||||
// Assert - digest must be identical regardless of input order
|
||||
Assert.Equal(bundle1.CompositeDigest, bundle2.CompositeDigest);
|
||||
}
|
||||
```
|
||||
|
||||
---
|
||||
|
||||
## 10. Determinism Manifest Schema
|
||||
|
||||
All replayable artifacts must include a determinism manifest conforming to the JSON Schema at:
|
||||
|
||||
`docs/technical/testing/schemas/determinism-manifest.schema.json`
|
||||
|
||||
Key fields:
|
||||
- `schemaVersion`: Must be `"1.0"`.
|
||||
- `artifactType`: One of `verdict`, `attestation`, `snapshot`, `proof`, `sbom`, `vex`.
|
||||
- `hashAlgorithm`: One of `sha256`, `sha384`, `sha512`.
|
||||
- `ordering`: One of `alphabetical`, `timestamp`, `insertion`, `canonical`.
|
||||
- `determinismGuarantee`: One of `strict`, `relaxed`, `best_effort`.
|
||||
|
||||
---
|
||||
|
||||
## 11. Checklist for Contributors
|
||||
|
||||
Before submitting a PR that involves digests or attestations:
|
||||
|
||||
- [ ] JSON is canonicalized via `Rfc8785JsonCanonicalizer` before hashing.
|
||||
- [ ] NFC normalization is enabled if user-supplied strings are involved.
|
||||
- [ ] Collections are sorted deterministically before iteration.
|
||||
- [ ] Timestamps are UTC with ISO 8601 format and `Z` suffix.
|
||||
- [ ] Numeric values avoid floating-point where possible.
|
||||
- [ ] Unit tests verify digest idempotency and permutation invariance.
|
||||
- [ ] Determinism manifest schema is validated for new artifact types.
|
||||
|
||||
---
|
||||
|
||||
## 12. Related Documents
|
||||
|
||||
- [docs/technical/testing/schemas/determinism-manifest.schema.json](../technical/testing/schemas/determinism-manifest.schema.json) - JSON Schema for manifests
|
||||
- [docs/modules/policy/design/policy-determinism-tests.md](../modules/policy/design/policy-determinism-tests.md) - Policy engine determinism
|
||||
- [docs/technical/testing/TEST_SUITE_OVERVIEW.md](../technical/testing/TEST_SUITE_OVERVIEW.md) - Testing strategy
|
||||
|
||||
---
|
||||
|
||||
## 13. Change Log
|
||||
|
||||
| Version | Date | Notes |
|
||||
|---------|------------|----------------------------------------------------|
|
||||
| 1.0 | 2025-12-27 | Initial version per DET-GAP-20. |
|
||||
301
docs/dev/contributing/corpus-contribution-guide.md
Normal file
301
docs/dev/contributing/corpus-contribution-guide.md
Normal file
@@ -0,0 +1,301 @@
|
||||
# Corpus Contribution Guide
|
||||
|
||||
**Sprint:** SPRINT_3500_0003_0001
|
||||
**Task:** CORPUS-014 - Document corpus contribution guide
|
||||
|
||||
## Overview
|
||||
|
||||
The Ground-Truth Corpus is a collection of validated test samples used to measure scanner accuracy. Each sample has known reachability status and expected findings, enabling deterministic quality metrics.
|
||||
|
||||
## Corpus Structure
|
||||
|
||||
```
|
||||
datasets/reachability/
|
||||
├── corpus.json # Index of all samples
|
||||
├── schemas/
|
||||
│ └── corpus-sample.v1.json # JSON schema for samples
|
||||
├── samples/
|
||||
│ ├── gt-0001/ # Sample directory
|
||||
│ │ ├── sample.json # Sample metadata
|
||||
│ │ ├── expected.json # Expected findings
|
||||
│ │ ├── sbom.json # Input SBOM
|
||||
│ │ └── source/ # Optional source files
|
||||
│ └── ...
|
||||
└── baselines/
|
||||
└── v1.0.0.json # Baseline metrics
|
||||
```
|
||||
|
||||
## Sample Format
|
||||
|
||||
### sample.json
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "gt-0001",
|
||||
"name": "Python SQL Injection - Reachable",
|
||||
"description": "Flask app with reachable SQL injection via user input",
|
||||
"language": "python",
|
||||
"ecosystem": "pypi",
|
||||
"scenario": "webapi",
|
||||
"entrypoints": ["app.py:main"],
|
||||
"reachability_tier": "tainted_sink",
|
||||
"created_at": "2025-01-15T00:00:00Z",
|
||||
"author": "security-team",
|
||||
"tags": ["sql-injection", "flask", "reachable"]
|
||||
}
|
||||
```
|
||||
|
||||
### expected.json
|
||||
|
||||
```json
|
||||
{
|
||||
"findings": [
|
||||
{
|
||||
"vuln_key": "CVE-2024-1234:pkg:pypi/sqlalchemy@1.4.0",
|
||||
"tier": "tainted_sink",
|
||||
"rule_key": "py.sql.injection.param_concat",
|
||||
"sink_class": "sql",
|
||||
"location_hint": "app.py:42"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
## Contributing a Sample
|
||||
|
||||
### Step 1: Choose a Scenario
|
||||
|
||||
Select a scenario that is not well-covered in the corpus:
|
||||
|
||||
| Scenario | Description | Example |
|
||||
|----------|-------------|---------|
|
||||
| `webapi` | Web application endpoint | Flask, FastAPI, Express |
|
||||
| `cli` | Command-line tool | argparse, click, commander |
|
||||
| `job` | Background/scheduled job | Celery, cron script |
|
||||
| `lib` | Library code | Reusable package |
|
||||
|
||||
### Step 2: Create Sample Directory
|
||||
|
||||
```bash
|
||||
cd datasets/reachability/samples
|
||||
mkdir gt-NNNN
|
||||
cd gt-NNNN
|
||||
```
|
||||
|
||||
Use the next available sample ID (check `corpus.json` for the highest).
|
||||
|
||||
### Step 3: Create Minimal Reproducible Case
|
||||
|
||||
**Requirements:**
|
||||
- Smallest possible code to demonstrate the vulnerability
|
||||
- Real or realistic vulnerability (use CVE when possible)
|
||||
- Clear entrypoint definition
|
||||
- Deterministic behavior (no network, no randomness)
|
||||
|
||||
**Example Python Sample:**
|
||||
|
||||
```python
|
||||
# app.py - gt-0001
|
||||
from flask import Flask, request
|
||||
import sqlite3
|
||||
|
||||
app = Flask(__name__)
|
||||
|
||||
@app.route("/user")
|
||||
def get_user():
|
||||
user_id = request.args.get("id") # Taint source
|
||||
conn = sqlite3.connect(":memory:")
|
||||
# SQL injection: user_id flows to query without sanitization
|
||||
result = conn.execute(f"SELECT * FROM users WHERE id = {user_id}") # Taint sink
|
||||
return str(result.fetchall())
|
||||
|
||||
if __name__ == "__main__":
|
||||
app.run()
|
||||
```
|
||||
|
||||
### Step 4: Define Expected Findings
|
||||
|
||||
Create `expected.json` with all expected findings:
|
||||
|
||||
```json
|
||||
{
|
||||
"findings": [
|
||||
{
|
||||
"vuln_key": "CWE-89:pkg:pypi/flask@2.0.0",
|
||||
"tier": "tainted_sink",
|
||||
"rule_key": "py.sql.injection",
|
||||
"sink_class": "sql",
|
||||
"location_hint": "app.py:13",
|
||||
"notes": "User input from request.args flows to sqlite3.execute"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 5: Create SBOM
|
||||
|
||||
Generate or create an SBOM for the sample:
|
||||
|
||||
```json
|
||||
{
|
||||
"bomFormat": "CycloneDX",
|
||||
"specVersion": "1.6",
|
||||
"version": 1,
|
||||
"components": [
|
||||
{
|
||||
"type": "library",
|
||||
"name": "flask",
|
||||
"version": "2.0.0",
|
||||
"purl": "pkg:pypi/flask@2.0.0"
|
||||
},
|
||||
{
|
||||
"type": "library",
|
||||
"name": "sqlite3",
|
||||
"version": "3.39.0",
|
||||
"purl": "pkg:pypi/sqlite3@3.39.0"
|
||||
}
|
||||
]
|
||||
}
|
||||
```
|
||||
|
||||
### Step 6: Update Corpus Index
|
||||
|
||||
Add entry to `corpus.json`:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "gt-0001",
|
||||
"path": "samples/gt-0001",
|
||||
"language": "python",
|
||||
"tier": "tainted_sink",
|
||||
"scenario": "webapi",
|
||||
"expected_count": 1
|
||||
}
|
||||
```
|
||||
|
||||
### Step 7: Validate Locally
|
||||
|
||||
```bash
|
||||
# Run corpus validation
|
||||
dotnet test tests/reachability/StellaOps.Reachability.FixtureTests \
|
||||
--filter "FullyQualifiedName~CorpusFixtureTests"
|
||||
|
||||
# Run benchmark
|
||||
stellaops bench corpus run --sample gt-0001 --verbose
|
||||
```
|
||||
|
||||
## Tier Guidelines
|
||||
|
||||
### Imported Tier Samples
|
||||
|
||||
For `imported` tier samples:
|
||||
- Vulnerability in a dependency
|
||||
- No execution path to vulnerable code
|
||||
- Package is in lockfile but not called
|
||||
|
||||
**Example:** Unused dependency with known CVE.
|
||||
|
||||
### Executed Tier Samples
|
||||
|
||||
For `executed` tier samples:
|
||||
- Vulnerable code is called from entrypoint
|
||||
- No user-controlled data reaches the vulnerability
|
||||
- Static or coverage analysis proves execution
|
||||
|
||||
**Example:** Hardcoded SQL query (no injection).
|
||||
|
||||
### Tainted→Sink Tier Samples
|
||||
|
||||
For `tainted_sink` tier samples:
|
||||
- User-controlled input reaches vulnerable code
|
||||
- Clear source → sink data flow
|
||||
- Include sink class taxonomy
|
||||
|
||||
**Example:** User input to SQL query, command execution, etc.
|
||||
|
||||
## Sink Classes
|
||||
|
||||
When contributing `tainted_sink` samples, specify the sink class:
|
||||
|
||||
| Sink Class | Description | Examples |
|
||||
|------------|-------------|----------|
|
||||
| `sql` | SQL injection | sqlite3.execute, cursor.execute |
|
||||
| `command` | Command injection | os.system, subprocess.run |
|
||||
| `ssrf` | Server-side request forgery | requests.get, urllib.urlopen |
|
||||
| `path` | Path traversal | open(), os.path.join |
|
||||
| `deser` | Deserialization | pickle.loads, yaml.load |
|
||||
| `eval` | Code evaluation | eval(), exec() |
|
||||
| `xxe` | XML external entity | lxml.parse, ET.parse |
|
||||
| `xss` | Cross-site scripting | innerHTML, document.write |
|
||||
|
||||
## Quality Criteria
|
||||
|
||||
Samples must meet these criteria:
|
||||
|
||||
- [ ] **Deterministic**: Same input → same output
|
||||
- [ ] **Minimal**: Smallest code to demonstrate
|
||||
- [ ] **Documented**: Clear description and notes
|
||||
- [ ] **Validated**: Passes local tests
|
||||
- [ ] **Realistic**: Based on real vulnerability patterns
|
||||
- [ ] **Self-contained**: No external network calls
|
||||
|
||||
## Negative Samples
|
||||
|
||||
Include "negative" samples where scanner should NOT find vulnerabilities:
|
||||
|
||||
```json
|
||||
{
|
||||
"id": "gt-0050",
|
||||
"name": "Python SQL - Properly Sanitized",
|
||||
"tier": "imported",
|
||||
"expected_count": 0,
|
||||
"notes": "Uses parameterized queries, no injection possible"
|
||||
}
|
||||
```
|
||||
|
||||
## Review Process
|
||||
|
||||
1. Create PR with new sample(s)
|
||||
2. CI runs validation tests
|
||||
3. Security team reviews expected findings
|
||||
4. QA team verifies determinism
|
||||
5. Merge and update baseline
|
||||
|
||||
## Updating Baselines
|
||||
|
||||
After adding samples, update baseline metrics:
|
||||
|
||||
```bash
|
||||
# Generate new baseline
|
||||
stellaops bench corpus run --all --output baselines/v1.1.0.json
|
||||
|
||||
# Compare to previous
|
||||
stellaops bench corpus compare baselines/v1.0.0.json baselines/v1.1.0.json
|
||||
```
|
||||
|
||||
## FAQ
|
||||
|
||||
### How many samples should I contribute?
|
||||
|
||||
Start with 2-3 high-quality samples covering different aspects of the same vulnerability class.
|
||||
|
||||
### Can I use synthetic vulnerabilities?
|
||||
|
||||
Yes, but prefer real CVE patterns when possible. Synthetic samples should document the vulnerability pattern clearly.
|
||||
|
||||
### What if my sample has multiple findings?
|
||||
|
||||
Include all expected findings in `expected.json`. Multi-finding samples are valuable for testing.
|
||||
|
||||
### How do I test tier classification?
|
||||
|
||||
Run with verbose output:
|
||||
```bash
|
||||
stellaops bench corpus run --sample gt-NNNN --verbose --show-evidence
|
||||
```
|
||||
|
||||
## Related Documentation
|
||||
|
||||
- [Tiered Precision Curves](../benchmarks/tiered-precision-curves.md)
|
||||
- [Reachability Analysis](../product-advisories/14-Dec-2025%20-%20Reachability%20Analysis%20Technical%20Reference.md)
|
||||
- [Corpus Index Schema](../../datasets/reachability/schemas/corpus-sample.v1.json)
|
||||
Reference in New Issue
Block a user