Files
git.stella-ops.org/docs/modules/feedser/architecture.md

238 lines
8.5 KiB
Markdown

# component_architecture_feedser.md - **Stella Ops Feedser** (2025Q4)
> Evidence collection library for backport detection and binary fingerprinting.
> **Scope.** Library architecture for **Feedser**: patch signature extraction, binary fingerprinting, and evidence collection supporting the four-tier backport proof system. Consumed primarily by Concelier's ProofService layer.
---
## 0) Mission & boundaries
**Mission.** Provide deterministic, cryptographic evidence collection for backport detection. Extract patch signatures from unified diffs and binary fingerprints from compiled code to enable high-confidence vulnerability status determination for packages where upstream fixes have been backported by distro maintainers.
**Boundaries.**
* Feedser is a **library**, not a standalone service. It does not expose REST APIs directly.
* Feedser **does not** make vulnerability decisions. It provides evidence that feeds into VEX statements and Policy Engine evaluation.
* Feedser **does not** store data. Storage is handled by consuming services (Concelier ProofService, Attestor).
* All outputs are **deterministic** with canonical JSON serialization and stable hashing.
---
## 1) Solution & project layout
```
src/Feedser/
├─ StellaOps.Feedser.Core/ # Patch signature extraction (HunkSig)
│ ├─ HunkSigExtractor.cs # Unified diff parser and normalizer
│ ├─ Models/
│ │ ├─ PatchSignature.cs # Deterministic patch identifier
│ │ ├─ HunkSignature.cs # Individual hunk with normalized content
│ │ └─ DiffParseResult.cs # Parse output with file paths and hunks
│ └─ Normalization/
│ └─ WhitespaceNormalizer.cs # Whitespace/comment stripping
├─ StellaOps.Feedser.BinaryAnalysis/ # Binary fingerprinting engine
│ ├─ BinaryFingerprintFactory.cs # Factory for fingerprinting strategies
│ ├─ IBinaryFingerprinter.cs # Fingerprinter interface
│ ├─ Models/
│ │ ├─ BinaryFingerprint.cs # Fingerprint record with method/value
│ │ └─ FingerprintMatchResult.cs # Match score and confidence
│ └─ Fingerprinters/
│ ├─ SimplifiedTlshFingerprinter.cs # TLSH fuzzy hashing
│ └─ InstructionHashFingerprinter.cs # Instruction sequence hashing
├─ plugins/
│ └─ concelier/ # Concelier integration plugin
└─ __Tests/
└─ StellaOps.Feedser.Core.Tests/ # Unit tests
```
---
## 2) External dependencies
* **Concelier ProofService** - Primary consumer; orchestrates four-tier evidence collection
* **Attestor ProofChain** - Consumes evidence for proof blob generation
* **.NET 10** - Runtime target
* No database dependencies (stateless library)
* No external network dependencies
---
## 3) Contracts & data model
### 3.1 Patch Signature (Tier 3 Evidence)
```csharp
public sealed record PatchSignature
{
public required string Id { get; init; } // Deterministic SHA256
public required string FilePath { get; init; } // Source file path
public required IReadOnlyList<HunkSignature> Hunks { get; init; }
public required string ContentHash { get; init; } // BLAKE3-256 of normalized content
public string? CommitId { get; init; } // Git commit SHA if available
public string? UpstreamCve { get; init; } // Associated CVE
}
public sealed record HunkSignature
{
public required int OldStart { get; init; }
public required int NewStart { get; init; }
public required string NormalizedContent { get; init; } // Whitespace-stripped
public required string ContentHash { get; init; }
}
```
### 3.2 Binary Fingerprint (Tier 4 Evidence)
```csharp
public sealed record BinaryFingerprint
{
public required string Method { get; init; } // tlsh, instruction_hash
public required string Value { get; init; } // Fingerprint value
public required string TargetPath { get; init; } // Binary file path
public string? FunctionName { get; init; } // Function if scoped
public required string Architecture { get; init; } // x86_64, aarch64, etc.
}
public sealed record FingerprintMatchResult
{
public required decimal Similarity { get; init; } // 0.0-1.0
public required decimal Confidence { get; init; } // 0.0-1.0
public required string Method { get; init; }
public required BinaryFingerprint Query { get; init; }
public required BinaryFingerprint Match { get; init; }
}
```
### 3.3 Evidence Tier Confidence Levels
| Tier | Evidence Type | Confidence Range | Description |
|------|--------------|------------------|-------------|
| 1 | Distro Advisory | 0.95-0.98 | Official vendor/distro statement |
| 2 | Changelog Mention | 0.75-0.85 | CVE mentioned in changelog |
| 3 | Patch Signature (HunkSig) | 0.85-0.95 | Normalized patch hash match |
| 4 | Binary Fingerprint | 0.55-0.85 | Compiled code similarity |
---
## 4) Core Components
### 4.1 HunkSigExtractor
Parses unified diff format and extracts normalized patch signatures:
```csharp
public interface IHunkSigExtractor
{
PatchSignature Extract(string unifiedDiff, string? commitId = null);
IReadOnlyList<PatchSignature> ExtractMultiple(string multiFileDiff);
}
```
**Normalization rules:**
- Strip leading/trailing whitespace
- Normalize line endings to LF
- Remove C-style comments (optional)
- Collapse multiple whitespace to single space
- Sort hunks by (file_path, old_start) for determinism
### 4.2 BinaryFingerprintFactory
Factory for creating fingerprinters based on binary type and analysis requirements:
```csharp
public interface IBinaryFingerprintFactory
{
IBinaryFingerprinter Create(FingerprintMethod method);
IReadOnlyList<IBinaryFingerprinter> GetAll();
}
public interface IBinaryFingerprinter
{
string Method { get; }
BinaryFingerprint Extract(ReadOnlySpan<byte> binary, string path);
FingerprintMatchResult Match(BinaryFingerprint query, BinaryFingerprint candidate);
}
```
**Fingerprinting methods:**
| Method | Description | Confidence | Use Case |
|--------|-------------|------------|----------|
| `tlsh` | TLSH fuzzy hash | 0.75-0.85 | General binary similarity |
| `instruction_hash` | Normalized instruction sequences | 0.55-0.75 | Function-level matching |
---
## 5) Integration with Concelier
Feedser is consumed via `StellaOps.Concelier.ProofService.BackportProofService`:
```
BackportProofService (Concelier)
├─ Tier 1: Query advisory_observations (distro advisories)
├─ Tier 2: Query changelogs via ISourceRepository
├─ Tier 3: Query patches via IPatchRepository + HunkSigExtractor
├─ Tier 4: Query binaries + BinaryFingerprintFactory
└─ Aggregate → ProofBlob with combined confidence score
```
The ProofService orchestrates evidence collection across all tiers and produces cryptographic proof blobs for downstream consumption.
---
## 6) Security & compliance
* **Determinism**: All outputs use canonical JSON with sorted keys, UTC timestamps
* **Tamper evidence**: BLAKE3-256 content hashes for all signatures
* **No secrets**: Library handles only public patch/binary data
* **Offline capable**: No network dependencies in core library
---
## 7) Performance targets
* **Patch extraction**: < 10ms for typical unified diff (< 1000 lines)
* **Binary fingerprinting**: < 100ms for 10MB ELF binary
* **Memory**: Streaming processing for large binaries; no full file buffering
* **Parallelism**: Thread-safe extractors; concurrent fingerprinting supported
---
## 8) Observability
Library consumers (ProofService) emit metrics:
* `feedser.hunk_extraction_duration_seconds`
* `feedser.binary_fingerprint_duration_seconds`
* `feedser.fingerprint_match_score{method}`
* `feedser.evidence_tier_confidence{tier}`
---
## 9) Testing matrix
* **Unit tests**: HunkSigExtractor parsing, normalization edge cases
* **Fingerprint tests**: Known binary pairs with expected similarity scores
* **Determinism tests**: Same input produces identical output across runs
* **Performance tests**: Large diff/binary processing within targets
---
## 10) Historical note
Concelier was formerly named "Feedser" (see `docs/airgap/airgap-mode.md`). The module was refactored:
- **Feedser** retained as evidence collection library
- **Concelier** became the advisory aggregation service consuming Feedser
---
## Related Documentation
* Concelier architecture: `../concelier/architecture.md`
* Attestor ProofChain: `../attestor/architecture.md`
* Backport proof system: `../../reachability/backport-proofs.md`